Spark Operator

This topic provides an overview of Spark Operator on HPE Ezmeral Runtime Enterprise.

HPE Ezmeral Runtime Enterprise 5.4.0 and later supports multiversion Spark Operator. You can submit Spark Applications for different versions of Apache Spark using a single Spark Operator. When you submit the Spark Applications, Spark Operator creates a Kubernetes spark-submit job. The spark-submit job spawns the driver pod. A Spark driver pod launches a set of Spark executors that execute the job you want to run.

Starting from HPE Ezmeral Runtime Enterprise 5.6.0, Spark 3.3.x and later versions support enhanced S3 features introduced in Hadoop 3.x.

Starting from HPE Ezmeral Runtime Enterprise 5.5.0, you can choose to use Spark images provided by HPE Ezmeral Runtime Enterprise or your own open-source Spark images.

Spark Operator supports open-source Spark version compatible with the Kubernetes version supported on HPE Ezmeral Runtime Enterprise. With the support for open-source Spark, you can build your Spark with Hadoop 3 profile or any other profile of your choice.

You can integrate open-source Spark with Spark History Server by using PVC.

To use open-source Spark, build Spark and then build Spark images to run in HPE Ezmeral Runtime Enterprise. See Building Spark and Building Images.

However, open-source Spark does not support the following:
  • Data Fabric filesystem, Data Fabric Streams, and any other Data Fabric sources and sinks which require Data Fabric client.
  • Data Fabric specific security features (Data Fabric SASL).
NOTE Livy does not support open-source Spark images on HPE Ezmeral Runtime Enterprise.
HPE Ezmeral Runtime Enterprise supports all the features and parameters supported by open-source Spark on K8s documentation excluding the security feature. HPE Ezmeral Runtime Enterprise supports the following Spark security features:
  • If you are a local user, set the spark.mapr.user.secret option on your Spark application yaml file.
  • If you are AD/LDAP user, spark.mapr.user.secret option is automatically set using the ticketgenerator webhook.
  • You must not change the user context. See using pod security context.