Enabling GPU Support for Spark Operator

Describes how to enable and allocate GPU resources on Spark Operator.

Enabling GPU Support for Spark Operator

To enable GPU processing and allocate GPU resources on Spark Operator, follow these steps:

Set the image option within the spec property of the Spark application yaml file to gcr.io/mapr-252711/spark-<spark-version>:<image-tag> . To see the list of Spark images, see List of Spark Images.
Add the following configuration options to sparkConf section within the spec property.
- To enable the RAPIDS plugin and allocate the GPU resources, add:
```
# Enabling RAPIDs plugin
spark.plugins: "com.nvidia.spark.SQLPlugin"
spark.rapids.sql.enabled: "true"
spark.rapids.force.caller.classloader: "false"
 
# GPU allocation and discovery settings
spark.task.resource.gpu.amount: "1"
spark.executor.resource.gpu.amount: "1"
spark.executor.resource.gpu.vendor: "nvidia.com"
```
- To set the path to the GPU discovery script, add:
```
spark.executor.resource.gpu.discoveryScript: "/opt/mapr/spark/spark-<spark-version>/examples/src/main/scripts/getGpusResources.sh"
```
- To set the RAPIDS shim layer used for the run, add:
```
spark.rapids.shims-provider-override: "com.nvidia.spark.rapids.shims.<spark-identifier>.SparkShimServiceProvider"
```
  The Spark version distributed by Hewlett Packard Enterprise is compatible with its corresponding open-source version. The RAPIDS jar includes the shim layer provider classes called com.nvidia.spark.rapids.shims.[spark-identifier].SparkShimServiceProvider. You can replace the [spark-identifier] based on the Spark distributed by Hewlett Packard Enterprise such as:
  - For spark-3.5.0, the identifier is spark350.
  - For example, for spark-gpu-3.5.0, set the RAPIDS shim layer as follows:
```
spark.rapids.shims-provider-override: "com.nvidia.spark.rapids.shims.spark350.SparkShimServiceProvider"
```

Verifying Spark Applications are Running on GPU

To verify the Spark applications are running on GPU, you can use the explain Spark method.

Run the following PySpark application:

from pyspark.sql import SQLContext
from pyspark import SparkConf
from pyspark import SparkContext
 
conf = SparkConf()
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
 
df = sqlContext.createDataFrame([1,2,3], "int").toDF("value")
df.createOrReplaceTempView("df")
 
sqlContext.sql("SELECT * FROM df WHERE value<>1").explain()
sqlContext.sql("SELECT * FROM df WHERE value<>1").show()
 
sc.stop()

If you get the following output where the explain method prints the GPU-related stages, you can verify that your Spark application is running on GPU.

== Physical Plan ==
GpuColumnarToRow false
+- GpuFilter NOT (value#2 = 1), true
   +- GpuRowToColumnar targetsize(2147483647)
      +- *(1) SerializeFromObject [input[0, int, false] AS value#2]
         +- Scan[obj#1]

However, if you get the following output, your Spark application is not running on GPU but instead on CPU. You must ensure that Spark applications are configured properly to work on GPU.

== Physical Plan ==
*(1) Filter NOT (value#2 = 1)
+- *(1) SerializeFromObject [input[0, int, false] AS value#2]
   +- Scan[obj#1]

Spark Operator YAML Example Using GPU for Spark 3.5.0

Example:

apiVersion: "sparkoperator.hpe.com/v1beta2"
kind: SparkApplication
metadata:
  name:
spark-eep-gpu-350
  namespace: spark
spec:
  sparkConf:
    # Enabling RAPIDs plugin
    spark.plugins: "com.nvidia.spark.SQLPlugin"
    spark.rapids.sql.enabled: "true"
    spark.rapids.force.caller.classloader: "false"
 
    # GPU allocation and discovery settings
    spark.task.resource.gpu.amount: "1"
    spark.executor.resource.gpu.amount: "1"
    spark.executor.resource.gpu.vendor: "nvidia.com"
    spark.executor.resource.gpu.discoveryScript: "/opt/mapr/spark/spark-3.5.0/examples/src/main/scripts/getGpusResources.sh"
    spark.rapids.shims-provider-override: "com.nvidia.spark.rapids.shims.spark350.SparkShimServiceProvider"
 
  type: Python
  sparkVersion:3.5.0
  mode: cluster
  image: gcr.io/mapr-252711/spark-gpu-3.5.0:v3.5.0
  imagePullPolicy: Always
  mainApplicationFile: .../path/to/application.py
  restartPolicy:
    type: Never
  imagePullSecrets:
    - imagepull
  driver:
    cores: 1
    coreLimit: "1000m"
    memory: "1024m"
    labels:
      version: 3.5.0
  executor:
    cores: 1
    coreLimit: "1000m"
    instances: 1
    memory: "2G"
    labels:
      version: 3.5.0

HPE Ezmeral Unified Analytics Software 1.5 Documentation
Abstract	HPE Ezmeral Unified Analytics Software is a usage-based Software-as-a-Service (SaaS) model that operationalizes hybrid and multi-cloud modern analytical workloads through a simple user interface, easily installed and deployed in minutes. HPE Ezmeral Unified Analytics Software separates compute and storage for flexible, cost-efficient scalability to securely access data stored in multiple data platforms, enabling you to run traditional and advanced analytics workloads with open-source tools.
Published	July 2025
Edition	1.5.0
Topic last updated	2025-02-07