Running Spark Applications in Namespaces

Describes how namespaces work with regard to Spark applications in HPE Ezmeral Unified Analytics Software.

Information in this topic relates to Spark applications that use the HPE-curated Spark images or Spark OSS images with the security context set in the Spark application YAML, as described in Setting Security Context for Spark OSS Images.

HPE Ezmeral Unified Analytics Software users (admins and members) can submit Spark applications through the following clients and interfaces:

  • HPE Ezmeral Unified Analytics Software UI
  • APIs/CLI (kubectl)
  • Notebooks
  • Airflow DAGs

By default, when a user submits a Spark application, the Spark application runs in the user's designated namespace, isolating the user's work and resource use from other users in the HPE Ezmeral Unified Analytics Software cluster. For example, if user01 is signed into HPE Ezmeral Unified Analytics Software and submits a Spark application, the Spark application automatically runs in the user01 namespace. Only user01 can access the Spark application and Spark application details in the Spark History Server UI.

Alternatively, a user can run their Spark applications in the spark namespace. When a user changes the namespace to spark in the Spark application YAML, the Spark application runs in the spark namespace and all users (admins and members) can access the Spark application through the HPE Ezmeral Unified Analytics Software UI. However, only the user that submitted the Spark application can access the application details in the Spark History Server UI.
NOTE
Currently, the HPE Ezmeral Unified Analytics Software UI does not support running Spark applications in the spark namespace. You can only run Spark applications in the spark namespace through kubectl, notebooks, and Airflow DAGs.
The following table describes how HPE Ezmeral Unified Analytics Software responds when you submit Spark applications through the supported clients and interfaces:
Client/Interface Description
HPE Ezmeral Unified Analytics Software UI
  • Spark applications run in the user's designated namespace.
  • Does not support running Spark applications in the spark namespace.
  • If a user changes the namespace in their Spark application, the system automatically reverts the namespace back to the namespace of the user submitting the Spark application. For example, if user01 submits the Spark application as user02, the system automatically reverts the namespace back to user01 and runs the application in the user01 namespace.
API/CLI (kubectl)
  • Spark applications run in the user's designated namespace.
  • Users can change the namespace to spark; Spark applications run in the spark namespace and become accessible to all users.
  • If a user changes the namespace in their Spark application, for example user01 changes the namespace to user02, the system accepts the Spark application, but returns an access denied error.
Notebook
  • Spark applications run in the user's designated namespace.
  • If a user changes the namespace in their Spark application, for example user01 changes the namespace to user02, the system returns an access denied error.
Airflow DAG
  • A Spark application launched through an Airflow DAG automatically runs in the namespace of the user that deployed the DAG. For example, if user01 deploys a DAG with a Spark application in the workflow, the Spark application runs in the user01 namespace.
  • Manually triggered DAGs launch in the namespace of the trigger event owner.
  • Scheduled DAGs launch in the namespace of the last user to un-pause the DAG.

Spark History Server

In an HPE Ezmeral Unified Analytics Software cluster, one Spark History Server runs in the spark namespace. Users can go to the Spark History Server UI to view a list of all Spark applications that have run. However, users can only view the details of Spark applications that they submit, regardless of the namespace they use (their own namespace or the spark namespace).

If a user submits a Spark application in the spark namespace, only that user can view the application details in the Spark History Server UI. For example, if user01 submits a spark application in the spark namespace, user02 cannot access the Spark application details in the Spark History Server UI. Only user01 can view the Spark application details.

The system returns an unauthorized message when users try to view application details for Spark applications that were submitted by other users.

Setting Security Context for Spark OSS Images

The Spark OSS images do not contain the security context required to run Spark applications against volumes in HPE Ezmeral Unified Analytics Software. HPE Ezmeral Unified Analytics Software denies user access to the volume if it cannot authenticate the user, which results in Spark application failures.

To add security context to your Spark application, add the following configuration setting in the Spark application YAML:
sparkConf:
    spark.hpe.webhook.security.context.autoconfigure: "true"

This security context flag sets the pod security context and enables HPE Ezmeral Unified Analytics Software to recognize you as a valid HPE Ezmeral Unified Analytics Software user when you run your Spark applications.

When you add the security context flag to the Spark application YAML and run the Spark application, the application automatically runs in your user-designated namespace. If you change the namespace to spark, the Spark application runs in the spark namespace.
WARNING
Do not set the security context in HPE-Curated Spark images. Setting the security context in HPE-Curated Spark images causes Spark applications to fail.

For additional information, see User Isolation and Setting the User Context.