Spark Overview
This topic provides a brief overview of Apache Spark on HPE Ezmeral Runtime Enterprise.
Spark is a unified analytics engine with high data processing speed that offers the high-level API in Java, Scala, Python, and R. Spark provides the in-memory computing and optimized query execution for the fast data processing.
When you submit a Spark application using the spark-submit to a Kubernetes cluster, you start a Spark driver within a Kubernetes pod. This driver creates the Spark executor pods within the Kubernetes cluster to execute the tasks.
Apache Spark on HPE Ezmeral Runtime Enterprise
- HPE Ezmeral Runtime Enterprise provides enterprise ready unified Spark which supports Apache Livy based RESTful interface.
- Spark 3.x.x supports RAPIDS Accelerator by Nividia to accelerate the processing for Spark by using the GPUs. See Nvidia Spark-RAPIDS Accelerator for Spark.
- Spark 3.x.x provides ACID transactions for Spark applications with Delta Lake. See Delta Lake with Apache Spark.
- Spark supports the following:
- Global Hive Metastore: Starting from HPE Ezmeral Runtime Enterprise 5.3, you can access the Hive Metastore configured inside one Kubernetes cluster from Spark applications that is configured in another Kubernetes cluster. See Hive Metastore.
- Spark History Server. See Spark History Server.
- Spark Thrift Server. See Spark Thrift Server.
- You can run a Spark job on Kubernetes clusters in the HPE Ezmeral Runtime Enterprise in the following ways:
- Using Spark Operator. See Spark Operator.
- Using Livy to make REST calls. See Submitting Spark Application Using Livy.
- Using spark scripts from spark-client pods. See Submitting Spark Applications Using spark-submit.
- Using Airflow to schedule Spark jobs. See Using Airflow to Schedule Spark Applications.
- You can run Spark jobs in the Data Fabric tenants or
non Data Fabric tenants:
- Data Fabric Tenants
-
Tenants created on HPE Ezmeral Data Fabric on Kubernetes on the HPE Ezmeral Runtime Enterprise or on HPE Ezmeral Data Fabric on Bare Metal outside of the HPE Ezmeral Runtime Enterprise.
- Non Data Fabric Tenants
- Tenants created on an external storage that is not the HPE Ezmeral Data Fabric.
-
To learn about new enhancements and changes for Spark on HPE Ezmeral Runtime Enterprise, see What's New in Version 5.7.x.