Enabling GPU Support for Spark

Describes NVIDIA spark-rapids accelerator support for Spark, and how to enable and allocate the GPU resources on Spark.

In HPE Ezmeral Unified Analytics Software, you can use RAPIDS Accelerator for Apache Spark by NVIDIA to accelerate the processing for Spark by using the GPUs.

The GPU image (spark-gpu-<spark-version>), for example,spark-gpu-3.5.0 , has a built-in open-source RAPIDS plugin in HPE Ezmeral Unified Analytics Software.

To see the list of Spark GPU images, see List of Spark Images.

NOTE
  • Do not allocate GPUs for a driver pod. GPUs are used by executor pods only.
  • With MIG configuration, only one GPU can be assigned per application. For details, see GPU Support.

Spark Configurations for GPU

Spark Configurations Key Value
GPU Images

See List of Spark Images

spark.kubernetes.container.image gcr.io/mapr-252711/spark-gpu-<spark-version>:<image-tag>
Enable RAPIDS plugin spark.plugins com.nvidia.spark.SQLPlugin
spark.rapids.sql.enabled true
spark.rapids.force.caller.classloader false
Allocate GPU resources spark.task.resource.gpu.amount 1
spark.executor.resource.gpu.amount 1
spark.executor.resource.gpu.vendor nvidia.com
Set GPU discovery script path spark.executor.resource.gpu.discoveryScript /opt/mapr/spark/spark-<spark-version>/examples/src/main/scripts/getGpusResources.sh
Set RAPIDS shim layer for the run1 spark.rapids.shims-provider-override com.nvidia.spark.rapids.shims.<spark-identifier>.SparkShimServiceProvider
1The Spark version distributed by HPE is compatible with its corresponding open-source version. The RAPIDS jar includes the shim layer provider classes called com.nvidia.spark.rapids.shims.[spark-identifier].SparkShimServiceProvider. You can replace the [spark-identifier] based on the Spark distributed by HPE such as:
  • For spark-3.5.0, the identifier is spark350.