Spark Overview

This topic provides a brief overview of Apache Spark on HPE Ezmeral Runtime Enterprise.

Spark is a unified analytics engine with high data processing speed that offers the high-level API in Java, Scala, Python, and R. Spark provides the in-memory computing and optimized query execution for the fast data processing.

You can run the Spark on Kubernetes managed clusters on HPE Ezmeral Runtime Enterprise. For more information about running Spark on Kubernetes, see Apache Spark on Kubernetes.
NOTE Starting from HPE Ezmeral Runtime Enterprise 5.3, Spark Standalone is no longer supported.

When you submit a Spark application using the spark-submit to a Kubernetes cluster, you start a Spark driver within a Kubernetes pod. This driver creates the Spark executor pods within the Kubernetes cluster to execute the tasks.

Apache Spark on HPE Ezmeral Runtime Enterprise

  • HPE Ezmeral Runtime Enterprise provides enterprise ready unified Spark which supports Apache Livy based RESTful interface.
  • Spark 3.x.x supports RAPIDS Accelerator by Nividia to accelerate the processing for Spark by using the GPUs. See Nvidia Spark-RAPIDS Accelerator for Spark.
  • Spark 3.x.x provides ACID transactions for Spark applications with Delta Lake. See Delta Lake with Apache Spark.
  • Spark supports the following:
    • Global Hive Metastore: Starting from HPE Ezmeral Runtime Enterprise 5.3, you can access the Hive Metastore configured inside one Kubernetes cluster from Spark applications that is configured in another Kubernetes cluster. See Hive Metastore.
    • Spark History Server. See Spark History Server.
    • Spark Thrift Server. See Spark Thrift Server.
  • You can run a Spark job on Kubernetes clusters in the HPE Ezmeral Runtime Enterprise in the following ways:
  • You can run Spark jobs in the Data Fabric tenants or non Data Fabric tenants:
    Data Fabric Tenants

    Tenants created on HPE Ezmeral Data Fabric on Kubernetes on the HPE Ezmeral Runtime Enterprise or on HPE Ezmeral Data Fabric on Bare Metal outside of the HPE Ezmeral Runtime Enterprise.

    See HPE Ezmeral Data Fabric as Tenant/Persistent Storage.

    Non Data Fabric Tenants
    Tenants created on an external storage that is not the HPE Ezmeral Data Fabric.
  • To learn about new enhancements and changes for Spark on HPE Ezmeral Runtime Enterprise, see What's New in Version 5.6.x.

    Figure 1. Overview of Running Spark Applications on HPE Ezmeral Runtime Enterprise