Virtual Cores, RAM, Storage, and GPU Devices

NOTE This article uses the term "tenant" to refer to both tenants and projects.

Virtual CPU (vCPU) cores are modeled as follows:

  • The license specifies the maximum number of CPU cores that can be surfaced by the set of on-premises and/or public cloud hosts in a given HPE Ezmeral Runtime Enterprise deployment. Starting with Container Platform version 3.8, the use or non-use of CPU hyperthreads does not impact the license and vCPU count.
  • The number of available vCPU cores is the number of physical CPU cores multiplied by the CPU allocation ratio specified by the Platform Administrator. For example, if the hosts have 40 physical CPU cores and the Platform Administrator specifies a CPU allocation ratio of 3, then a total of 120 vCPU cores will be displayed as available. You can allocate an unlimited number of vCPU cores to each tenant or project. The collective core usage for all nodes (containers) within a tenant/project will be constrained by either the tenant's assigned quota or the available cores in the system, whichever limit is reached first. The tenant quotas and the CPU allocation ratio act together to prevent tenant members from overloading the system's CPU resources.
  • When two or more nodes are assigned to the same host, they contend for the same physical CPU resources of that host. CPU resources are allocated to such nodes in a ratio determined by their vCPU core count. For example, a node with 8 cores will receive twice as much CPU time as a node with 4 cores.
  • The Platform Administrator can also specify a Quality of Service (QOS) multiplier for each tenant or project. In the case of CPU resource contention, the node vCPU count is multiplied by the tenant/project QOS multiplier when determining the physical CPU time that will be allotted to each container running within a given tenant or project. For example, a node with 8 vCPU cores in a tenant or project with a QOS multiplier of 1 will receive the same physical CPU time as a node with 4 vCPU cores in a tenant or project with a QOS multiplier of 2. The QOS multiplier is used to describe relative tenant/project priorities when CPU resource contention occurs; it does not affect the overall cap on CPU load established by the CPU allocation ratio and tenant/project quotas.

RAM is modeled as follows:

  • The total amount of available RAM is equal to the amount of unreserved RAM. Unreserved RAM is the amount of RAM remaining after reserving some memory in each host for platform services. For example, if your deployment consists of four hosts that each have 128GB of physical RAM with 110GB of unreserved RAM, the total amount of RAM available to share among tenants or projects will be 440GB.
  • You may allocate an unlimited amount of RAM to each tenant/project. The collective RAM usage for all nodes within a tenant or project will be constrained by either the tenant's or project's assigned quota or the available RAM in the system, whichever limit is reached first.

Storage is modeled as follows:

  • Root disk storage space is allocated from the disks on each Worker host that are assigned as Node Storage disks when adding the Worker to the platform. Each node consumes Node Storage space equivalent to its root disk size on the Worker host where that node is placed.

If compatible GPU devices are present, then they are modeled as follows:

  • You must install the NVIDIA drivers on the hosts before deploying HPE Ezmeral Runtime Enterprise, as described in GPU Driver Installation.
  • The total number of available GPU resources is equal to the number of physical GPU devices. For example, if your deployment consists of four hosts that each have 8 physical GPU devices, then there will be a total of 32 GPU devices available to share among tenants and/or projects.
  • Quotas on (tenant) namespaces for GPUs are applied by the nvidia.com/gpu specifier, which applies to physical GPUs and MIG instances in single strategy only. For example, specifying a quota of three devices of 1g.5gb is not supported.
  • You may allocate an unlimited number of GPU resources to each tenant or project. The collective GPU resource usage for all nodes within a tenant or project will be constrained by either the tenant's or project's assigned quota or the available GPU devices in the system, whichever limit is reached first.
  • GPU devices are expensive resources, and their usage is maximized as follows:
    • If a container requires GPU resources, then HPE Ezmeral Runtime Enterprise attempts to place that container in such a way as to maximize GPU resource utilization on a given host and to reduce or eliminate wasted resources.
    • HPE Ezmeral Runtime Enterprise does not have the concept of a virtual GPU. This means that a container deployed on one host cannot access the GPU resources of another host. Containers are limited to accessing GPUs only on the host where they are deployed.
    • HPE Ezmeral Runtime Enterprise does not allow sharing the same GPU device between multiple containers simultaneously. Once a GPU device is allocated to a given container, that container has exclusive access to that GPU.

Default values will appear in the various quota fields when you are creating a tenant/project. These default values will be 25% of the total system resources for most fields. The exception to this rule is the quota for GPU devices where the default value is 0. When configuring each resource quota, the web interface displays the total available amount of that resource for comparison. You may edit these quota values or delete a value and leave the field blank to indicate that the tenant does not have a quota defined for that resource.

Assigning a quota of resources to a tenant does not reserve those resources for that tenant when that tenant is idle (not running one or more clusters). This means that a tenant may not actually be able to acquire system resources up to the limit of its configured quota.

  • You may assign a quota for any amount of resources to any tenants regardless of the actual number of available system resources. A deployment where the total amount of configured tenant resource quotas exceeds the current amount of system resources is called over-provisioning. Over-provisioning occurs when one or more of the following conditions are met:
    • You have a tenant which has resource quotas that either exceed the system resources or are undefined. This tenant will only be able to obtain the amount of resources that are actually available. This arrangement is typically a convenience to make sure that the tenant is always able to fully utilize the platform, even if you add more hosts in the future.
    • You have multiple tenants where none have overly large or undefined quotas, but where the sum of their quotas exceeds the resources currently available. In this case, you are not expecting all tenants to attempt to use all their resource quotas simultaneously. Still, you have given each tenant the ability to claim more than its “fair share” of resources when these extra resources are available. In this case, you must balance the need for occasional bursts of usage that may exceed quota resources against the need to restrict how much a “greedy” tenant can consume. A larger quota gives more freedom for burst consumption of unused resources while also expanding the potential for one tenant to prevent other tenants from fully utilizing their quotas.
NOTE Over-provisioning is useful in certain situations; however, avoiding over-provisioning prevents potential resource conflicts by ensuring that all tenants are guaranteed to be able to obtain their configured quota of virtual CPU cores, RAM, and GPU devices.