Using HPE-Curated Spark Images

Describes how to use HPE-curated Spark images to submit Spark applications.

HPE-Curated Spark images are Apache Spark images that are customized to support Data Fabric filesystem, Data Fabric Streams, or any other Data Fabric sources and sinks that require a Data Fabric client. These Spark images also support Data Fabric-specific security features (data-fabric SASL (maprsasl)).

HPE-curated Spark images are the images used by GUI for default experience. See List of Spark Images.

You can use HPE-curated Spark images with four different workflows as follows:

Using the Create Spark Application GUI

To use HPE-curated Spark images, choose one of the following options in the GUI:
Using New application

If you choose the New application option in the Application Details step of the Create Spark Application wizard, your Spark application will be configured with HPE-curated Spark image. The List of Spark Images page also lists the default HPE-curated Spark images used for GUI experience.

Using Upload YAML
If you choose the Upload YAML option, your Spark application will be configured with your chosen Spark image on your YAML file.
image: <base-repository>/<image-name>:<image-tag>

To learn about how to submit Spark applications by using GUI, see Creating Spark Applications.

Using the Spark Interactive Sessions GUI

To use HPE-curated Spark images when using the Spark Interactive Sessions, follow these steps:
  1. Perform the creating interactive sessions instructions until you reach the Spark Configurations box in the Session Configurations and Dependencies step. See Creating Interactive Sessions.
  2. In the Spark Configurations box, you have two options:
    • If you leave the Key and Value boxes empty, the Spark interactive sessions will be created with the HPE-curated Spark image. The List of Spark Images page also lists the default HPE-curated Spark images used for GUI experience.
    • If you set the Key and Value boxes for the Spark image of your choice by adding the following key-value pairs, your Spark interactive session will be created with the Spark image of your choice.
      Key: spark.kubernetes.container.image
      Value: <spark-image-of-your-choice>
      
  3. To specify the details for other boxes or options in the Session Configurations and Dependencies step and to complete creating interactive sessions, see Creating Interactive Sessions.

Using Notebooks

To use HPE-curated Spark images when using Spark magic (%manage_spark) to create Livy sessions, follow these steps:
  1. Run %manage_spark to connect to the Livy server and start a new session. See %manage_spark for details.
  2. Once you run %manage_spark, you have two options:
    • Creating sessions with the default Spark configurations. This will use the HPE-curated Spark image to create an interactive session. The List of Spark Images page also lists the default HPE-curated Spark images used for GUI experience.
    • Running %config_spark and updating the value of spark.kubernetes.container.image to the Spark image of your choice. This will use the Spark image of your choice to create an interactive session.
  3. To specify the details for the other boxes or options in the Create Session step and to complete creating Livy session, see %manage_spark.

Using Airflow

When you submit the Spark application by using Airflow, your Spark application will be configured with your chosen Spark image in your YAML file. This YAML file is set in the Airflow DAG.

For example:
submit = SparkKubernetesOperator(
    task_id='submit',
    namespace="example",
    application_file="example.yaml",
    dag=dag,
    api_group="sparkoperator.hpe.com",
    enable_impersonation_from_ldap_user=True
)

To learn about how to submit Spark applications by using Airflow DAG, see Submitting Spark Applications by Using DAGs.