Viewing GPU and MIG Devices Using kubectl Commands

View GPU and MIG device information using kubectl commands.

Prerequisites

Required access rights: Platform Administrator

Procedure

  • To verify that the Kubernetes pod recognizes the GPU resources, enter the following command:
    kubectl get nodes --selector=nvidia.com/gpu.count -Lnvidia.com/gpu.count -Lnvidia.com/gpu.product -Lnvidia.com/mig.strategy

    The output of the command lists the nodes that have GPU devices. For each node, it lists the GPU product name and, for MIG-enabled GPUs, the configured MIG strategy.

    For example:

    
    NAME                     STATUS   ROLES    AGE   VERSION    GPU.COUNT   GPU.PRODUCT   MIG.STRATEGY
    dev04.mycorp.net         Ready    worker   22d   v1.20.11   1           Tesla-P4      mixed
    
  • To identify the GPU and MIG resources—if any—in a given node, use the kubectl describe node <node-name> command.

    The output of the kubectl describe node <gpu-node> command varies as follows:

    MIG-enabled GPU, mixed strategy

    If the host has GPUs that are MIG-enabled using a mixed strategy, the system returns something like the following:

    ...
    Capacity:
    cpu:                48 
    ephemeral-storage:  1049136384Ki 
    hugepages-1Gi:      0 
    hugepages-2Mi:      0 
    memory:             131523060Ki 
    nvidia.com/mig-1g.5gb:   1
    nvidia.com/mig-2g.10gb:  1
    nvidia.com/mig-3g.20gb:  1
    pods:               110
    MIG-enabled GPU, single strategy

    If the host has GPUs that are MIG-enabled using a single strategy, the output is similar to the hosts that have GPUs that are not MIG-enabled, except that the number of GPUs is greater than one:

    ...
    Capacity:
    nvidia.com/gpu:          7
    ...
    Allocatable:
    nvidia.com/gpu:          7
    ...
    GPU is not MIG-enabled

    If the host has GPUs that are not MIG-enabled, the system returns something like the following:

    ...
    Capacity: 
    cpu:                48 
    ephemeral-storage:  1049136384Ki 
    hugepages-1Gi:      0 
    hugepages-2Mi:      0 
    memory:             131523060Ki 
    nvidia.com/gpu:     1 
    pods:               110
    ...
    Host does not have a GPU

    If the host does not have a GPU, then the nvidia.com/gpu field does not appear.