Enabling GPU Support for Spark
Describes NVIDIA spark-rapids
accelerator support for Spark, and how
to enable and allocate the GPU resources on Spark.
In HPE Ezmeral Unified Analytics Software, you can use RAPIDS Accelerator for Apache Spark by NVIDIA to accelerate the processing for Spark by using the GPUs.
The GPU image (spark-gpu-<spark-version>), for example,spark-gpu-3.5.0
, has a built-in open-source RAPIDS plugin
in HPE Ezmeral Unified Analytics Software.
To see the list of Spark GPU images, see List of Spark Images.
NOTE
- Do not allocate GPUs for a driver pod. GPUs are used by executor pods only.
- With MIG configuration, only one GPU can be assigned per application. For details, see GPU Support.
Spark Configurations for GPU
Spark Configurations | Key | Value |
---|---|---|
GPU Images | spark.kubernetes.container.image |
gcr.io/mapr-252711/spark-gpu-<spark-version>:<image-tag> |
Enable RAPIDS plugin | spark.plugins |
com.nvidia.spark.SQLPlugin |
spark.rapids.sql.enabled |
true |
|
spark.rapids.force.caller.classloader |
false |
|
Allocate GPU resources | spark.task.resource.gpu.amount |
1 |
spark.executor.resource.gpu.amount |
1 |
|
spark.executor.resource.gpu.vendor |
nvidia.com |
|
Set GPU discovery script path | spark.executor.resource.gpu.discoveryScript |
/opt/mapr/spark/spark-<spark-version>/examples/src/main/scripts/getGpusResources.sh |
Set RAPIDS shim layer for the run1 | spark.rapids.shims-provider-override |
com.nvidia.spark.rapids.shims.<spark-identifier>.SparkShimServiceProvider |
1The Spark version distributed by HPE is compatible with its corresponding
open-source version. The RAPIDS jar includes the shim layer provider classes called
com.nvidia.spark.rapids.shims.[spark-identifier].SparkShimServiceProvider
.
You can replace the [spark-identifier]
based on the Spark distributed by
HPE such as:- For spark-3.5.0, the identifier is spark350.