Enabling GPU Support for Spark Operator
Describes how to enable and allocate GPU resources on Spark Operator.
Enabling GPU Support for Spark Operator
To enable GPU processing and allocate GPU resources on Spark Operator, follow these
steps:
- Set the image option within the
spec
property of the Spark application yaml file togcr.io/mapr-252711/spark-gpu-<spark-version>:<image-tag>
. To see the list of Spark GPU images, see List of Spark Images. - Add the following configuration options to
sparkConf
section within thespec
property.- To enable the RAPIDS plugin and allocate the GPU resources, add:
# Enabling RAPIDs plugin spark.plugins: "com.nvidia.spark.SQLPlugin" spark.rapids.sql.enabled: "true" spark.rapids.force.caller.classloader: "false" # GPU allocation and discovery settings spark.task.resource.gpu.amount: "1" spark.executor.resource.gpu.amount: "1" spark.executor.resource.gpu.vendor: "nvidia.com"
- To set the path to the GPU discovery script,
add:
spark.executor.resource.gpu.discoveryScript: "/opt/mapr/spark/spark-<spark-version>/examples/src/main/scripts/getGpusResources.sh"
- To set the RAPIDS shim layer used for the run,
add:
The Spark version distributed by Hewlett Packard Enterprise is compatible with its corresponding open-source version. The RAPIDS jar includes the shim layer provider classes calledspark.rapids.shims-provider-override: "com.nvidia.spark.rapids.shims.<spark-identifier>.SparkShimServiceProvider"
com.nvidia.spark.rapids.shims.[spark-identifier].SparkShimServiceProvider
. You can replace the[spark-identifier]
based on the Spark distributed by Hewlett Packard Enterprise such as:- For spark-3.5.0, the identifier is spark350.
- For example, for spark-gpu-3.5.0, set the RAPIDS shim layer as
follows:
spark.rapids.shims-provider-override: "com.nvidia.spark.rapids.shims.spark350.SparkShimServiceProvider"
- To enable the RAPIDS plugin and allocate the GPU resources, add:
Verifying Spark Applications are Running on GPU
To verify the Spark applications are running on GPU, you can use the explain Spark method.
Run the following PySpark application:
from pyspark.sql import SQLContext
from pyspark import SparkConf
from pyspark import SparkContext
conf = SparkConf()
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
df = sqlContext.createDataFrame([1,2,3], "int").toDF("value")
df.createOrReplaceTempView("df")
sqlContext.sql("SELECT * FROM df WHERE value<>1").explain()
sqlContext.sql("SELECT * FROM df WHERE value<>1").show()
sc.stop()
If you get the following output where the explain method prints the GPU-related stages, you
can verify that your Spark application is running on GPU.
== Physical Plan ==
GpuColumnarToRow false
+- GpuFilter NOT (value#2 = 1), true
+- GpuRowToColumnar targetsize(2147483647)
+- *(1) SerializeFromObject [input[0, int, false] AS value#2]
+- Scan[obj#1]
However, if you get the following output, your Spark application is not running on GPU but
instead on CPU. You must ensure that Spark applications are configured properly to work on
GPU.
== Physical Plan ==
*(1) Filter NOT (value#2 = 1)
+- *(1) SerializeFromObject [input[0, int, false] AS value#2]
+- Scan[obj#1]
Spark Operator YAML Example Using GPU for Spark 3.5.0
Example:
apiVersion: "sparkoperator.hpe.com/v1beta2"
kind: SparkApplication
metadata:
name:
spark-eep-gpu-350
namespace: spark
spec:
sparkConf:
# Enabling RAPIDs plugin
spark.plugins: "com.nvidia.spark.SQLPlugin"
spark.rapids.sql.enabled: "true"
spark.rapids.force.caller.classloader: "false"
# GPU allocation and discovery settings
spark.task.resource.gpu.amount: "1"
spark.executor.resource.gpu.amount: "1"
spark.executor.resource.gpu.vendor: "nvidia.com"
spark.executor.resource.gpu.discoveryScript: "/opt/mapr/spark/spark-3.5.0/examples/src/main/scripts/getGpusResources.sh"
spark.rapids.shims-provider-override: "com.nvidia.spark.rapids.shims.spark350.SparkShimServiceProvider"
type: Python
sparkVersion:3.5.0
mode: cluster
image: gcr.io/mapr-252711/spark-gpu-3.5.0:v3.5.0
imagePullPolicy: Always
mainApplicationFile: .../path/to/application.py
restartPolicy:
type: Never
imagePullSecrets:
- imagepull
driver:
cores: 1
coreLimit: "1000m"
memory: "1024m"
labels:
version: 3.5.0
executor:
cores: 1
coreLimit: "1000m"
instances: 1
memory: "2G"
labels:
version: 3.5.0