Spark Operator
This topic provides an overview of Spark Operator on HPE Ezmeral Runtime Enterprise.
HPE Ezmeral Runtime Enterprise 5.4.0 and
later supports multiversion Spark Operator. You can submit Spark Applications for different
versions of Apache Spark using a single Spark Operator. When you submit the Spark
Applications, Spark Operator creates a Kubernetes spark-submit
job. The
spark-submit
job spawns the driver pod. A Spark driver pod launches a set
of Spark executors that execute the job you want to run.
Starting from HPE Ezmeral Runtime Enterprise 5.6.0, Spark 3.3.x and later versions support enhanced S3 features introduced in Hadoop 3.x.
Starting from HPE Ezmeral Runtime Enterprise 5.5.0, you can choose to use Spark images provided by HPE Ezmeral Runtime Enterprise or your own open-source Spark images.
Spark Operator supports open-source Spark version compatible with the Kubernetes version supported on HPE Ezmeral Runtime Enterprise. With the support for open-source Spark, you can build your Spark with Hadoop 3 profile or any other profile of your choice.
You can integrate open-source Spark with Spark History Server by using PVC.
To use open-source Spark, build Spark and then build Spark images to run in HPE Ezmeral Runtime Enterprise. See Building Spark and Building Images.
- Data Fabric filesystem, Data Fabric Streams, and any other Data Fabric sources and sinks which require Data Fabric client.
- Data Fabric specific security features (Data Fabric SASL).
- If you are a local user, set the
spark.mapr.user.secret
option on your Spark applicationyaml
file. - If you are AD/LDAP user,
spark.mapr.user.secret
option is automatically set using the ticketgenerator webhook. - You must not change the user context. See using pod security context.