Creating Spark Applications

Describes how to create and submit Spark applications using HPE Ezmeral Unified Analytics Software.

Prerequisites

Must have a main application file (for example, compiled jar file for Java or Scala).
Must know the runtime dependencies of your application that are not built-in to the main application file.
Must know your application arguments.

Create a Spark Application

Complete the following steps to create and submit a Spark application:

Sign in to HPE Ezmeral Unified Analytics Software.
In the left navigation bar, select one of the following options:
- Click the Analytics icon and click Spark Applications on the left navigation bar of the HPE Ezmeral Unified Analytics Software screen.
- Click the Tools & Frameworks icon on the left navigation bar. Navigate to the Spark Operator tile under the Analytics tab and click Open.

Click Create Application on the Spark Applications screen. Navigate through each step within the Create Spark Application wizard.

The following table lists the steps in wizard and instructions:

Steps	Instructions
Application Details	Create an application or upload a preconfigured YAML file. YAML FILE - When you select Upload YAML, you can upload a preconfigured YAML file from your local system. Click Select File to upload the YAML. The fields in the wizard are populated with the information from YAML. Name - Enter the application name. Description - Enter the application description.
Configure Spark Application	Configure the Spark application: Type - Select the application type from Java, Scala, Python, or R. Source - Select the location of the main application file from User Directory, Shared Directory, S3, and Other. HPE Ezmeral Unified Analytics Software preconfigures Spark applications and Livy sessions in such a way that both `<username>` and `shared` volumes are mounted to driver and executor runtimes. For additional details, see the Selecting the Location of the Main Application File section below. File Name - Manually enter the location and file name of the application for the S3 and Other sources. For example: `s3a://apps/my_application.jar` For User Directory and Shared Directory, click Browse, and browse and select files. NOTE Ensure the extension of the main application file matches the selected application type. The extension must be `.py` for Python, `.jar` for Java and Scala, and `.r` for R applications. Class Name - Enter main class of the application for Java or Scala applications. Arguments - Click + Add Argument to add input parameters as required by the application. NOTE To refer to data in mounted folders from application source code, use `file://` schema. If a Spark application is reading a file from the `shared` or `user` volume and is taking a path to the file as an application argument, the argument will be `file://[mount-path]/path/to/input/file`. For example: `User Directory: file:///mounts/<user-name>-volume/ Shared Directory: file:///mounts/shared-volume/`
Dependencies	To add dependencies required to run your applications, select a dependency type from excludePackages, files, jars, packages, pyfiles, or repositories, and enter the value of the dependency. To add more than one dependency, click Add Dependency. For example: Enter the package names as the values for the excludePackages dependency type. Enter the locations of file, for example, s3://<path-to file>, local://<path-to-file> as the values for files, jars, pyfiles, or repositories.
Driver Configuration	Configure the number of cores, core limits, and memory. The number of cores must be less than or equal to the core limit. See Configuring Memory for Spark Applications. When boxes in this wizard are left blank, the default values are set. The default values are as follows: Number of Cores: 1 Core Limit: 1 Memory: 512m
Executor Configuration	Configure the number of executors, number of cores, core limits, and memory. The number of cores must be less than or equal to the core limit. See Configuring Memory for Spark Applications. When boxes in this wizard are left blank, the default values are set. The default values are as follows: Number of Executors: 1 Number of Cores per Executor: 1 Core Limit per Executor: 1 Memory per Executor: 512m
Schedule Application	To schedule a Spark application to run at a certain time, toggle Schedule to Run. You can configure the frequency intervals and set the concurrency policy, successful run history limit, and failed run history limit. Set the Frequency Interval in two ways: To choose from predefined intervals, select Predefined Frequency Interval and click Update to open a dialog with predefined intervals. To set the frequency interval, select Custom Frequency Interval. The Frequency Interval accepts any of the following values: CRON expression with Field 1: minute (0–59) Field 2: hour (0–23) Field 3: day of the month (1–31) Field 4: month (1–12, JAN - DEC) Field 5: day of the week (0–6, SUN - SAT) Example: `0 1 1 * `, `02 02 ? WED, THU` Predefined macro @yearly @monthly @weekly @daily @hourly Interval using @every <duration> Units: nanosecond (ns), microsecond (us, µs), millisecond (ms), second (s), minute (m), and hour (h). Example: `@every 1h`, `@every 1h30m10s`
Review	Review the application details. Click the pencil icon in each section to navigate to the specific step to change the application configuration. To open an editor to change the application configuration using YAML in the GUI, click Edit YAML. You can use the editor to add the extra configuration options not available through the application wizard. To apply the changes, click Save Changes. To cancel the changes, click Discard Changes.

To submit the application, click Create Spark Application on the bottom right of the Review step.

Results:

The Spark application is created and will immediately run, or will wait to run at its scheduled time. You can view it on the Spark Applications screen.

Selecting the Location of the Main Application File

Use one of the following methods to select the location of the main application file:

Uploading Files to the User and Shared Directories

To upload files to the user and shared directories:

Open a different HPE Ezmeral Unified Analytics Software browser.
In the left navigation bar, select Data Engineering → Data Sources and then select the Data Volumes tab.
On the Data Volumes tab, select your user directory or the shared directory.
The following image shows an example of a user (bob) directory and a shared directory:

If you do not see your user directory or the shared directory, contact your administrator.
Click Upload to upload the Spark application files to the user/ or shared/ directory.
Return to the browser you were working in with the Configure Spark Application wizard.
Click Browse, and navigate to the location where you uploaded files.
Select the Spark application files.

Using S3

When you select S3 as the Source, the S3 Endpoint, Secret, and File Name fields appear. The following sections describe what values to enter in these fields:

S3 Endpoint

Enter an S3 endpoint for direct access to an external S3 data source or enter an S3 endpoint for access through the S3 proxy in Unified Analytics, as described in Getting the Data Source Name and S3 Proxy Endpoint URL.
For an explanation of direct access versus S3 proxy access, see Configuring a Spark Application to Access External S3 Object Storage.

Secret

For HPE-Curated Spark and Spark OSS images, you can enter access-token in the Secret field. For information about the access-token secret, see Auth Tokens.
For HPE-Curated Spark images, you can leave the Secret field empty if you include spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.spark.s3a.EzSparkAWSCredentialProvider" in the Spark configuration.
Alternatively, you can generate a secret, as described in Configuring a Spark Application to Directly Access Data in an External S3 Data Source and Configuring a Spark Application to Access Data in an External S3 Data Source through the S3 Proxy Layer.

File Name

Enter the location and name of the Spark application file.

Using Other

Select Other as the data source, to reference other locations of the application file.

For example, to refer to main application files and dependency files, or to refer to a file inside the specific Spark image, use the local:// schema.

local:///opt/mapr/spark/spark-3.2.0/examples/jars/spark-examples_2.12-3.2.0.16-eep-810.jar

HPE Ezmeral Unified Analytics Software 1.5 Documentation
Abstract	HPE Ezmeral Unified Analytics Software is a usage-based Software-as-a-Service (SaaS) model that operationalizes hybrid and multi-cloud modern analytical workloads through a simple user interface, easily installed and deployed in minutes. HPE Ezmeral Unified Analytics Software separates compute and storage for flexible, cost-efficient scalability to securely access data stored in multiple data platforms, enabling you to run traditional and advanced analytics workloads with open-source tools.
Published	July 2025
Edition	1.5.0
Topic last updated	2025-02-24