Using Your Own Open-Source Spark Images

Describes how to use your own open-source Spark images to submit Spark applications.

You can use your own open-source Spark images that are compatible with the Kubernetes version supported on HPE Ezmeral Unified Analytics Software. By bringing your own open-source Spark, you can build Spark with any profile of your choice; however, there will be no support for Data Fabric filesystem, Data Fabric Streams, or any other Data Fabric sources and sinks that require a Data Fabric client. Also, open-source Spark images will not support Data Fabric-specific security features (data-fabric SASL (maprsasl)).

To use your own open-source Spark images, follow the next steps:
  1. Build Spark. See Building Spark.
  2. Build Spark images to run in HPE Ezmeral Unified Analytics Software. See Building Images.
  3. Choose one of the following:

Using the Create Spark Application GUI

To use your own open-source Spark images, choose one of the following option in the GUI:
Using Upload YAML
  1. Configure your Spark YAML file with the built Spark image of your choice.
    image: <base-repository>/<image-name>:<image-tag>
  2. To set the logged-in user’s context, add the following configuration in the sparkConf section.
    spark.hpe.webhook.security.context.autoconfigure: "true"
    To learn more about user context, see Setting the User Context.
  3. Perform the instructions to create a Spark application as described in Creating Spark Applications until you reach the Application Details step.
  4. In the Application Details step, choose the Upload YAML option.
  5. Click Select File and, browse and upload the YAML file.
  6. To specify the details for other boxes or options in the Application Details step and to complete creating the Spark application, see Creating Spark Applications.
Using New application
  1. Perform the instructions to create a Spark application as described in Creating Spark Applications until you reach the Review step.
  2. To open an editor to change the application configuration using YAML in the GUI, click Edit YAML.
  3. Replace the default Spark image in YAML with your built open-source Spark image.
    image: <base-repository>/<image-name>:<image-tag>
  4. To set the logged-in user’s context, add the following configuration in the sparkConf section.
    spark.hpe.webhook.security.context.autoconfigure: "true"
    To learn more about user context, see Setting the User Context.
  5. To submit the application with your own Spark image, click Create Spark Application on the bottom right of the Review step.

Using Airflow

When you submit the Spark application by using Airflow, your Spark application will be configured with your chosen Spark image in your YAML file. This YAML file is set in the Airflow DAG.

For example:
submit = SparkKubernetesOperator(
    task_id='submit',
    namespace="example",
    application_file="example.yaml",
    dag=dag,
    api_group="sparkoperator.hpe.com",
    enable_impersonation_from_ldap_user=True
)
To learn about how to submit Spark applications by using Airflow DAG, see Submitting Spark Applications by Using DAGs.