Spark

Provides a brief overview of Apache Spark in HPE Ezmeral Unified Analytics Software.

Spark is a unified analytics engine with high data processing speed that offers high-level APIs in Java, Scala, Python, and R. Spark provides the in-memory computing and optimized query execution for fast data processing.

In HPE Ezmeral Unified Analytics Software, there are two controllers for running Spark workloads. These controllers are Spark Operator and Livy server.



HPE Ezmeral Unified Analytics Software supports multi-version Spark Operator. You can submit Spark Applications for different versions of Apache Spark using a single Spark Operator.

You can choose to use one of the supported Spark images to submit your Spark application using the Spark Operator workflow. See Using Spark Images.

To see the list of the Spark images distributed by HPE Ezmeral Unified Analytics Software, see List of Spark Images.

Livy server uses the Rest API and Spark images (supporting Data Fabric services) provided by HPE Ezmeral Unified Analytics Software to submit the Spark applications. To learn about the supported version of Spark, see Support Matrix.

NOTE
Livy does not support Spark OSS images or your own open-source Spark images on HPE Ezmeral Unified Analytics Software.

Features and Functionality

HPE Ezmeral Unified Analytics Software provides an enterprise-ready, unified Spark experience that supports an Apache Livy-based interactive sessions..

Spark in HPE Ezmeral Unified Analytics Software supports the following features and functionality:

  • ACID transactions for Spark applications with Delta Lake.
  • Details for both Spark applications and Livy sessions are stored in Spark History Server. See Spark History Server.
  • Run Spark jobs from HPE Ezmeral Unified Analytics Software using the following components:
    • Spark Operator: The following are entry points for the Spark Operator:
    • Livy Server: The following are entry points for the Livy server:
      • Kubeflow Notebook: You can use Spark Magics to run Livy sessions using Kubeflow notebooks. See Notebook Magic Functions.
      • Interactive Spark Sessions GUI available in HPE Ezmeral Unified Analytics Software. See Creating Interactive Sessions.
      • Livy REST API (with basic authentication).
      • Livy native UI (with platform SSO authentication): You can use the Livy native UI to troubleshoot such as checking the state of the session or state of statements. You cannot submit Spark applications using the Livy native UI.
  • Spark applications and Livy sessions are preconfigured in such a way that both user and shared volumes are mounted to driver and executor runtimes and you can use these folders to pass files into Spark runtime when using the HPE Ezmeral Unified Analytics Software GUI. However, user and shared volumes are not mounted to driver and executor runtimes when using the Livy REST API to create Livy sessions.
  • Dynamically set user context to prevent impersonation calls for better security.