Using GPUs in Kubernetes Pods
This topic describes how to identify and request GPU and MIG resources, and how to use node labels and the Kubernetes nodeAffinity feature to constrain the pods that are eligible for scheduling.
Identifying GPU Resources
You can view GPU and MIG resources in HPE Ezmeral Runtime Enterprise using
the GUI or by using kubectl
or nvidia-smi
commands. See GPU and MIG Support.
Requesting GPU Resources
A Kubernetes application can request GPU resources in its YAML file, and these resources will be scheduled accordingly.
HPE Ezmeral Runtime Enterprise taints GPU hosts to try to eliminate having non-GPU pods scheduled on hosts with GPUs. However, GPU-equipped hosts will be used for non-GPU pods if no other resources are available.
There are two key parts to specifying a GPU resource in the YAML file:
- Specifying the correct key name in the
resources:
specification. For GPUs in HPE Ezmeral Runtime Enterprise, that key name is:nvidia.com/gpu
For example:
resources: limits: nvidia.com/gpu: 2
- Setting the NVIDIA_DRIVER_CAPABILITIES environment variable to the value:
compute,utility
For example:
env: - name: "NVIDIA_DRIVER_CAPABILITIES" value: "compute,utility"
If this is a KubeDirector application with GPU support, such as Jupyter Notebook with ML toolkits, when you select a nonzero GPU count in the UI, HPE Ezmeral Runtime Enterprise adds the NVIDIA_DRIVER_CAPABILITIES environment variable to the KD app YAML automatically. Otherwise, you can add the environment variable manually.
You include these items in any native Kubernetes resource that includes a Container object (link opens an external website in a new browser tab or window), including pods and higher-level pod-creating resources such as Deployment, StatefulSet, and DaemonSet.
In a KubeDirectorCluster specification, you include these items in the RoleSpec (link opens an external website in a new browser tab or window) of the role that accesses GPUs.
To specify MIG resources, see Requesting MIG Resources.
Using nodeAffinity
You might want to restrict the application to run on a specific GPU type because of availability or cost considerations in your business environment. For example, using an A100 GPU might have a different billing rate than other types of GPUs.
You can use a combination of node labels and the Kubenetes nodeAffinity feature (link opens an external website in a new browser tab or window) to constrain which nodes pods are eligible to be scheduled on.
Using nodeAffinity to Select By GPU Type
You might want to restrict the application to run on a specific GPU type because of availability or cost considerations in your business environment. For example, using an A100 GPU might have a different billing rate than other types of GPUs.
You can use a combination of node labels and the Kubenetes nodeAffinity feature (link opens an external website in a new browser tab or window) to constrain which nodes pods are eligible to be scheduled on.
The nodeAffinity
feature includes an expressive matching language,
and the ability to specify a preference instead of a hard requirement. You can also
use the match expressions and operators to express an anti-affinity.
The procedure, in concept, is the following:
- If needed, the Kubernetes Cluster Administrator or Platform Administrator can
label the nodes to which you want to apply preferences or restrictions.
If you want to use an existing default node label, you do not need to create and apply label key-value pairs to nodes, but you do need the Kubernetes Cluster Administrator or Platform Administrator to supply you with the list of node labels.
A Kubernetes Cluster Administrator or Platform Administrator can get a valid list of keys and values of node labels by querying with
kubectl
commands. For an example, see Listing the nvidia.com Node Labels.For example, in HPE Ezmeral Runtime Enterprise, nodes that have GPUs have a set of default node labels, one of which has the key:
nvidia.com/gpu.product
. One of the valid values of that key isTesla-P4
.However, you might want to enable users that create applications to specify the appropriate category of GPU without knowing the exact model identifier of the GPU. For example, you might want to label one or more nodes as having "general-purpose" or "higher-performance" GPUs, using node labels such as
gputype=general-purpose
. In your deployment, you might apply the same label to hosts that have one of several GPU models. - Specify the
nodeAffinity
in theaffinity
field.Any native Kubernetes resource that includes a PodSpec object (link opens an external website in a new browser tab or window) can put an affinity field into that object. This includes pods and higher-level pod-creating resources such as Deployment, StatefulSet, and DaemonSet.
In a KubeDirectorCluster specification, you include the affinity field in the RoleSpec (link opens an external website in a new browser tab or window).
In the following example, nodeAffinity
expresses a preference to
schedule RESTserver pods in nodes with a Tesla-P4 GPU.
Specifying preferredDuringSchedulingIgnoredDuringExecution
instead
of requiredDuringSchedulingIgnoredDuringExecution
indicates that,
if a preferred node is not available at the time the pod is scheduled, the pod may
be scheduled in a node that is not eligible according to the
matchExpressions
.
...
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: nvidia.com/gpu.product
operator: In
values:
- Tesla-P4
...
Requesting MIG Resources
As with requesting GPU resources, setting the NVIDIA_DRIVER_CAPABILITIES environment
variable to compute,utility
is required. However, the way you
specify the MIG instance differs in both resource requests and in the standard
nvidia.com/gpu.product
node label values.
For applications that support specifying resources for MIG-enabled GPUs, the way you specify the MIG instance differs depending on the Kubernetes MIG strategy chosen by the Platform Administrator.
single
strategy-
If the
single
strategy is used, when you request resources, you specify the number of MIG instances in the same way as for physical GPUs devices.For example:
... resources: limits: nvidia.com/gpu: 1 ...
If you have different nodes with different MIG configurations, you can use the
nodeAffinity
field to specify the node that has MIG configuration you want to use.The following example uses the standard
nvidia.com/gpu.product
key to require a particular MIG configuration. If a node with that configuration is not available, the pod will not be scheduled.... resources: limits: nvidia.com/gpu: 1 env: - name: NVIDIA_DRIVER_CAPABILITIES value: 'compute,utility' ... affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: nvidia.com/gpu.product operator: In values: - A100-SXM4-40GB-MIG-1g.5gb ...
mixed
strategy-
If the
mixed
strategy is used, when you request resources, you specify and enumerate MIG devices by their fully qualified name in the form:nvidia.com/mig-<slice_count>g.<memory_size>gb
If the
mixed
strategy is used, the value of standardnvidia.com/gpu.product
node label is the physical GPU.... resources: limits: nvidia.com/mig-3g.20gb: 1 env: - name: NVIDIA_DRIVER_CAPABILITIES value: 'compute,utility' ... affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: nvidia.com/gpu.product operator: In values: - A100-SXM4-40GB ...
As stated in "Device Enumeration" in the NVIDIA Multi-Instance GPU User Guide (link opens an external website in a new browser window or tab): "MIG supports running CUDA applications by specifying the CUDA device on which the application should be run. With CUDA 11, only enumeration of a single MIG instance is supported.”
Therefore, an application can access only one GPU MIG instance (the first instance applied to the pod), even if the pod spec specifies a limit larger than one.
Listing the nvidia.com
Node Labels
The following command queries all nodes for the node labels that have a label key
that starts with nvidia.com
. You must have Kubernetes Cluster
Administrator or Platform Administrator rights to execute this command.
kubectl get nodes -o json | jq '.items[].metadata.labels | with_entries(select(.key | startswith("nvidia.com")))'
The command is useful to obtain the valid nodeAffinity
key-value
pairs.