Helm Charts for Spark
Starting with the EEP 9.4.0 release, Helm chart integration for Spark is supported. This topic provides detailed information about Data Fabric's Helm and Spark integration, including installation steps.
Helm charts are packages of pre-configured Kubernetes resources that simplify the deployment, versioning, and management of applications on Kubernetes clusters. They bundle all the necessary YAML configuration files, templates, and dependencies into a single, reusable package that can be installed with a single command using the Helm CLI.
Starting with EEP 9.4.0, Helm chart integration for Spark on Data Fabric is supported. The Spark Helm chart enables containerized deployment of Spark applications to Kubernetes clusters (such as OpenShift Container Platform) by packaging the Tenant Operator, required RBAC configurations, secrets, and Spark runtime settings. This allows users to run Spark shells and submit Spark applications using Kubernetes as the cluster manager, with Data Fabric providing the underlying data layer.
This topic outlines the steps to set up and deploy a Data Fabric tenant on a Kubernetes cluster, including installing prerequisites, configuring the Tenant Operator, generating secrets, and running Spark applications.
For the latest Helm chart files and instructions, refer to the Commits · mapr/data-fabric-helm-charts branch of the data-fabric-helm-charts repository. See the Release release/fy26-q2 · mapr/data-fabric-helm-charts tag for the current release.
Prerequisites
- kubectl: Ensure that you have
kubectlinstalled and configured to interact with your Kubernetes cluster. - Helm: Ensure you have Helm installed.
- Kubernetes Cluster Access: Verify you have the necessary permissions to deploy resources to your Kubernetes cluster.
- Placeholder Values: Be prepared to replace placeholder values (such as registry URLs or secrets) with your actual environment-specific data.
You can find all necessary images in the https://hub.docker.com/u/maprtech repository.
1. Install Cert-Manager
- Apply Cert-Manager
CRDS:
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.8.0/cert-manager.crds.yaml - Add the Jetstack Helm Repository and
Update:
helm repo add jetstack https://charts.jetstack.io helm repo update - Install Cert-Manager in the 'cert-manager'
Namespace:
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.8.0
2. Create a PriorityClass
- Save the following YAML as
hpe-critical-priorityclass.yaml:apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: hpe-critical value: 1000000 globalDefault: false description: "Priority class for critical HPE pods" - Apply the
PriorityClass:
kubectl apply -f hpe-critical-priorityclass.yaml
3. Add Registry Certificate on Nodes
4. Install Tenant Operator
helm install tenant-operator tenant-operator-chart/ -n test-tenant --create-namespace -f tenant-operator-chart/values.yaml5. Generate External Secrets
Run the external secret generation script (gen-external-secrets.sh) on a
cluster node. This script is assumed to be located in your environment.
6. Apply External Secrets and Additonal Configurations
- Create the
hpe-externalclusterinfoNamespace:kubectl create namespace hpe-externalclusterinfo - Apply the External Secrets
YAML:
kubectl apply -f /tmp/mapr-external-secrets.yaml
hpe-secure Namespace and LDAP ConfigMap- Create the
hpe-securenamespace if it does not already exist:kubectl create namespace hpe-secure - Save the following as ldapclient-cm.yaml (replace with your actual LDAP
configuration):
apiVersion: v1 kind: ConfigMap metadata: name: ldapclient-cm namespace: hpe-secure annotations: kubectl.kubernetes.io/last-applied-configuration: "{}" data: ldap.conf: | BASE http://example.com URI test TLS_CACERTDIR /etc/openldap/certs TLS_REQCERT allow SASL_NOCANON on - Apply the ConfigMap:
kubectl apply -f ldapclient-cm.yaml
- Save the following as imagepull-secret.yaml (replace
........with your base64-encoded.dockerconfigjson):apiVersion: v1 kind: Secret metadata: name: imagepull namespace: sampletenant labels: hpe.com/cluster: none hpe.com/component: imagepull hpe.com/namespacetype: Tenant hpe.com/tenant: sampletenant hpe.com/version: 7.0.0 data: .dockerconfigjson: "........" type: kubernetes.io/dockerconfigjson - Apply the secret:
kubectl apply -f imagepull-secret.yaml
- Save the following as
hpe-pvcreate-clusterrole.yaml:apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: hpe-pvcreate rules: - apiGroups: [""] resources: ["persistentvolumes"] verbs: ["create", "delete", "get", "list", "watch"] - Apply the
ClusterRole:
kubectl apply -f hpe-pvcreate-clusterrole.yaml
7. Deploy the Tenant
- Refer to the example
configuration:
kubectl apply -f tenant-crs/external-full.yaml
8. Update Role Bindings
Modify the roles (hpe-sampletenant-role and
hpe-sampletenant-terminalrole) to include the necessary permissions for
managing persistent volumes, secrets, pods, and ConfigMaps as required by your tenant
workloads.
9. Run Ticket Creator and Launch Spark Shell
- Navigate to the tenant CLI and execute the bundled to the pod
script:
./ticketcreator.sh
- Execute the Spark Shell command, replacing placeholder values as
needed:
/opt/mapr/spark/spark-3.5.1/bin/spark-shell \ --master k8s://https://<kubernetes controller host>:6443 \ --conf spark.executor.instances=2 \ --conf spark.mapr.user.secret=<generated secret name> \ --conf spark.kubernetes.container.image=<spark image> \ --conf spark.kubernetes.namespace=<tenant namespace> \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.mapr.cluster.configMap=cluster-cm \ --conf spark.authenticate=false \ --conf spark.authenticate.enableSaslEncryption=false \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=hpe-<tenant namespace>IMPORTANTReplace the following placeholders:<kubernetes controller host>: The hostname or IP address of your Kubernetes controller.<generated secret name>: The name of the secret genrated by theticketcreator.shscript.<spark image>: The full name of the Spark container image you intend to use.<tenant namespace>: The namespace where your tenant is deployed (for example,sampletenant.
For the github readme instructions to integrating Helm Charts for Spark, see Tenant Operator Installation Guide.