Using HPE-Curated Spark Images
Describes how to use HPE-curated Spark images to submit Spark applications.
HPE-Curated Spark images are Apache Spark images that are
customized to support Data Fabric filesystem, Data Fabric Streams, or any other Data Fabric
sources and sinks that require a Data Fabric client. These Spark images also support Data
Fabric-specific security features (data-fabric SASL (maprsasl
)).
HPE-curated Spark images are the images used by GUI for default experience. See List of Spark Images.
- Spark Operator workflow using the Create Spark Application GUI. See Using the Create Spark Application GUI.
- Spark Operator workflow using Airflow. See Using Airflow.
- Livy workflow using the Spark Interactive Sessions GUI. See Using the Spark Interactive Sessions GUI.
- Livy workflow using Jupyter Notebooks. See Using Notebooks.
Using the Create Spark Application GUI
- Using New application
-
If you choose the New application option in the Application Details step of the Create Spark Application wizard, your Spark application will be configured with HPE-curated Spark image. The List of Spark Images page also lists the default HPE-curated Spark images used for GUI experience.
- Using Upload YAML
-
If you choose the Upload YAML option, your Spark application will be configured with your chosen Spark image on your YAML file.
image: <base-repository>/<image-name>:<image-tag>
To learn about how to submit Spark applications by using GUI, see Creating Spark Applications.
Using the Spark Interactive Sessions GUI
- Perform the creating interactive sessions instructions until you reach the Spark Configurations box in the Session Configurations and Dependencies step. See Creating Interactive Sessions.
- In the Spark Configurations box, you have two options:
- If you leave the Key and Value boxes empty, the Spark interactive sessions will be created with the HPE-curated Spark image. The List of Spark Images page also lists the default HPE-curated Spark images used for GUI experience.
- If you set the Key and Value boxes for the Spark image of your choice by adding
the following key-value pairs, your Spark interactive session will be created with
the Spark image of your choice.
Key: spark.kubernetes.container.image Value: <spark-image-of-your-choice>
-
To specify the details for other boxes or options in the Session Configurations and Dependencies step and to complete creating interactive sessions, see Creating Interactive Sessions.
Using Notebooks
%manage_spark
) to create Livy sessions, follow these steps:- Run
%manage_spark
to connect to the Livy server and start a new session. See %manage_spark for details. - Once you run
%manage_spark
, you have two options:- Creating sessions with the default Spark configurations. This will use the HPE-curated Spark image to create an interactive session. The List of Spark Images page also lists the default HPE-curated Spark images used for GUI experience.
- Running
%config_spark
and updating the value ofspark.kubernetes.container.image
to the Spark image of your choice. This will use the Spark image of your choice to create an interactive session.
- To specify the details for the other boxes or options in the Create Session step and to complete creating Livy session, see %manage_spark.
Using Airflow
When you submit the Spark application by using Airflow, your Spark application will be configured with your chosen Spark image in your YAML file. This YAML file is set in the Airflow DAG.
submit = SparkKubernetesOperator(
task_id='submit',
namespace="example",
application_file="example.yaml",
dag=dag,
api_group="sparkoperator.hpe.com",
enable_impersonation_from_ldap_user=True
)
To learn about how to submit Spark applications by using Airflow DAG, see Submitting Spark Applications by Using DAGs.