Helm Charts for Spark

Starting with the EEP 9.4.0 release, Helm chart integration for Spark is supported. This topic provides detailed information about Data Fabric's Helm and Spark integration, including installation steps.

Helm charts are packages of pre-configured Kubernetes resources that simplify the deployment, versioning, and management of applications on Kubernetes clusters. They bundle all the necessary YAML configuration files, templates, and dependencies into a single, reusable package that can be installed with a single command using the Helm CLI.

Starting with EEP 9.4.0, Helm chart integration for Spark on Data Fabric is supported. The Spark Helm chart enables containerized deployment of Spark applications to Kubernetes clusters (such as OpenShift Container Platform) by packaging the Tenant Operator, required RBAC configurations, secrets, and Spark runtime settings. This allows users to run Spark shells and submit Spark applications using Kubernetes as the cluster manager, with Data Fabric providing the underlying data layer.

This topic outlines the steps to set up and deploy a Data Fabric tenant on a Kubernetes cluster, including installing prerequisites, configuring the Tenant Operator, generating secrets, and running Spark applications.

For the latest Helm chart files and instructions, refer to the Commits · mapr/data-fabric-helm-charts branch of the data-fabric-helm-charts repository. See the Release release/fy26-q2 · mapr/data-fabric-helm-charts tag for the current release.

Prerequisites

  • kubectl: Ensure that you have kubectl installed and configured to interact with your Kubernetes cluster.
  • Helm: Ensure you have Helm installed.
  • Kubernetes Cluster Access: Verify you have the necessary permissions to deploy resources to your Kubernetes cluster.
  • Placeholder Values: Be prepared to replace placeholder values (such as registry URLs or secrets) with your actual environment-specific data.

You can find all necessary images in the https://hub.docker.com/u/maprtech repository.

1. Install Cert-Manager

Cert-Manager is required for managing certificates within the cluster.
  • Apply Cert-Manager CRDS:
    kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.8.0/cert-manager.crds.yaml
  • Add the Jetstack Helm Repository and Update:
    helm repo add jetstack https://charts.jetstack.io
    helm repo update
  • Install Cert-Manager in the 'cert-manager' Namespace:
    helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.8.0

2. Create a PriorityClass

A PriorityClass can be used to ensure critical pods are scheduled with higher priority.
  • Save the following YAML as hpe-critical-priorityclass.yaml:
    apiVersion: scheduling.k8s.io/v1
    kind: PriorityClass
    metadata:
      name: hpe-critical
    value: 1000000
    globalDefault: false
    description: "Priority class for critical HPE pods"
  • Apply the PriorityClass:
    kubectl apply -f hpe-critical-priorityclass.yaml

3. Add Registry Certificate on Nodes

Install the registry certificate on all cluster nodes according to your internal security procedures.
NOTE
The exact steps for this will vary depending on your specific environment. Refer to your internal documentation for guidance.

4. Install Tenant Operator

Install the Tenant Operator using Helm:
helm install tenant-operator tenant-operator-chart/ -n test-tenant --create-namespace -f tenant-operator-chart/values.yaml

5. Generate External Secrets

Run the external secret generation script (gen-external-secrets.sh) on a cluster node. This script is assumed to be located in your environment.

6. Apply External Secrets and Additonal Configurations

a. Apply External Secrets
  • Create the hpe-externalclusterinfo Namespace:
    kubectl create namespace hpe-externalclusterinfo
  • Apply the External Secrets YAML:
    kubectl apply -f /tmp/mapr-external-secrets.yaml
b. Create the hpe-secure Namespace and LDAP ConfigMap
  • Create the hpe-secure namespace if it does not already exist:
    kubectl create namespace hpe-secure
  • Save the following as ldapclient-cm.yaml (replace with your actual LDAP configuration):
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: ldapclient-cm
      namespace: hpe-secure
      annotations:
        kubectl.kubernetes.io/last-applied-configuration: "{}"
    data:
      ldap.conf: |
        BASE http://example.com
        URI test
        TLS_CACERTDIR /etc/openldap/certs
        TLS_REQCERT  allow
        SASL_NOCANON  on
  • Apply the ConfigMap:
    kubectl apply -f ldapclient-cm.yaml
c. Create an ImagePull Secret (in the tenant namespace)
  • Save the following as imagepull-secret.yaml (replace ........ with your base64-encoded .dockerconfigjson):
    apiVersion: v1
    kind: Secret
    metadata:
      name: imagepull
      namespace: sampletenant
      labels:
        hpe.com/cluster: none
        hpe.com/component: imagepull
        hpe.com/namespacetype: Tenant
        hpe.com/tenant: sampletenant
        hpe.com/version: 7.0.0
    data:
      .dockerconfigjson: "........"
    type: kubernetes.io/dockerconfigjson
  • Apply the secret:
    kubectl apply -f imagepull-secret.yaml
d. Create a Cluster Role for Persistent Volume Operations
  • Save the following as hpe-pvcreate-clusterrole.yaml:
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: hpe-pvcreate
    rules:
    - apiGroups: [""]
      resources: ["persistentvolumes"]
      verbs: ["create", "delete", "get", "list", "watch"]
  • Apply the ClusterRole:
    kubectl apply -f hpe-pvcreate-clusterrole.yaml

7. Deploy the Tenant

Deploy the tenant using the correct container images, DEP or EEP cluster name, and repository references.
  • Refer to the example configuration:
    kubectl apply -f tenant-crs/external-full.yaml

8. Update Role Bindings

Modify the roles (hpe-sampletenant-role and hpe-sampletenant-terminalrole) to include the necessary permissions for managing persistent volumes, secrets, pods, and ConfigMaps as required by your tenant workloads.

9. Run Ticket Creator and Launch Spark Shell

a. Run the Ticket Creator Script
  • Navigate to the tenant CLI and execute the bundled to the pod script:
    ./ticketcreator.sh
b. Launch Spark Shell (or any other executable)
  • Execute the Spark Shell command, replacing placeholder values as needed:
    /opt/mapr/spark/spark-3.5.1/bin/spark-shell \
      --master k8s://https://<kubernetes controller host>:6443 \
      --conf spark.executor.instances=2 \
      --conf spark.mapr.user.secret=<generated secret name> \
      --conf spark.kubernetes.container.image=<spark image> \
      --conf spark.kubernetes.namespace=<tenant namespace> \
      --conf spark.kubernetes.container.image.pullPolicy=Always \
      --conf spark.mapr.cluster.configMap=cluster-cm \
      --conf spark.authenticate=false \
      --conf spark.authenticate.enableSaslEncryption=false \
      --conf spark.kubernetes.authenticate.driver.serviceAccountName=hpe-<tenant namespace>
    IMPORTANT
    Replace the following placeholders:
    • <kubernetes controller host>: The hostname or IP address of your Kubernetes controller.
    • <generated secret name>: The name of the secret genrated by the ticketcreator.sh script.
    • <spark image>: The full name of the Spark container image you intend to use.
    • <tenant namespace>: The namespace where your tenant is deployed (for example, sampletenant.

For the github readme instructions to integrating Helm Charts for Spark, see Tenant Operator Installation Guide.