Expanding the Cluster

Describes how to add additional user-provided hosts to the management cluster to increase resource capacity and how to expand the cluster to include the additional user-provided hosts.

Expand the cluster when applications cannot run due to resource limitations, such as lack of vCPU.

When applications do not have enough resources to run, the system raises an alarm to alert you of the issue. In such cases, the HPE Ezmeral Unified Analytics Software administrator and system administrator can work together to add additional user-provided hosts to the pool of machines in the management cluster (control plane nodes) and workload cluster to increase the processing capacity of the cluster.

The following steps outline the cluster expansion process:

  1. An application triggers an alert to users that it does not have sufficient resources to run.
  2. Users contact the system administrator to request additional resources (add additional user-provided hosts to the management cluster).
  3. A system administrator adds user-provided hosts to the cluster, as described in the section Adding User-Provided Hosts to the Cluster.
  4. After the system administrator adds user-provided hosts to the cluster, the HPE Ezmeral Unified Analytics Software administrator signs into the HPE Ezmeral Unified Analytics Software UI and expands the cluster, as described in the section Expanding the Cluster.

Adding User-Provided Hosts to the Cluster

Use the ezfab-addhost.sh script to add control plane hosts and workload hosts to the ezfabric-host-pool. After you add hosts, you can expand the cluster, as described in the following section, Expanding the Cluster.

You can only add user-provided hosts to the cluster. User-provided hosts are machines that meet the installation prerequisites, as described in Installation Prerequisites.
  • If you want to use the high-availabilty (HA) feature when you expand the cluster, note that HA requires three master nodes. You must add two hosts to the ezfabric-host-pool with the controlplane role.
  • If you want to increase the VCPU or VGPU resources when you expand the cluster, you must add worker hosts or GPU hosts with enough resources (VCPU or VGPU) to ezfabric-host-pool with the worker role.
To add user-provided hosts to the ezfabric-host-pool, complete the following steps:
  1. From a CLI, sign in to the HPE Ezmeral Coordinator host.
  2. Download the ezfab-addhost-tool-1-4-x.tgz file at https://github.com/HPEEzmeral/troubleshooting/releases/download/v1.4.0/ezfab-addhost-tool-1-4-x.tgz.

    Use one of the following commands to download the file:
    curl -L -O https://github.com/HPEEzmeral/troubleshooting/releases/download/v1.4.0/ezfab-addhost-tool-1-4-x.tgz 
    wget https://github.com/HPEEzmeral/troubleshooting/releases/download/v1.4.0/ezfab-addhost-tool-1-4-x.tgz
  3. Untar the ezfab-addhost-tool-1-4-x.tgz file:
    tar -xzvf ezfab-addhost-tool-1-4-x.tgz
  4. Go to the ezfab-addhost-tool directory and view its contents:
    cd ezfab-addhost-tool 
    ls -al 
    The command returns results similar to the following:
    total 50504 
    drwxr-xr-x. 2  501 games      149 Feb  2 09:57 . 
    dr-xr-x---. 9 root root      4096 Feb  2 16:19 .. 
    -rw-r--r--. 1  501 games     1211 Jan 26 18:16 controlplane_input_template.yaml 
    -rwxr-xr-x. 1  501 games     2687 Feb 22 10:54 ezfab-addhost.sh
    -rwxr-xr-x. 1  501 games 51695616 Jan 26 14:05 ezfabricctl 
    -rw-r--r--. 1  501 games      360 Jan 26 18:24 input_example.yaml 
    -rw-r--r--. 1  501 games     1205 Jan 26 18:17 worker_input_template.yaml 
    You should see the ezfab-addhost.sh listed, as well as three YAML files (controlplane_input_template.yaml, worker_input_template.yaml, and input_example.yaml) that you can use as guides. Use the cat command to view the YAML files, for example:
    cat controlplane_input_template.yaml 
  5. Using the provided YAML files as a guide, create a YAML file.
  6. Run the ezfab-addhost.sh script:
    When you run the script, the system returns the supported options:
    Check OS ... 
    Parse options ... 
    Please provide the input yaml file that includes the hosts info 
    USAGE: ./ezfab-addhost.sh <options> 
          -i/--input: the input yaml file that includes the hosts info.   
          -k/--kubeconfig: the coordinator's kubeconfig file(optional).  
  7. Run the ezfab-addhost.sh script with the -i and -k options, as shown:
    ./ezfab-addhost.sh -i  <your-input-file>.yaml -k ~/.kube/config 
  8. After the ezfab-addhost.sh script successfully completes, run the following command to check the new hosts in ezfabric-host-pool:
    kubectl get ezph -A 
    • If the ezfab-addhost.sh script fails, check the logs in the log directory.
    • If the failure is due to the wrong username/password or some transient error, run the following command to delete the hosts in the error state and then retry:
      ./ezfabricctl poolhost destroy --input $INPUT_YAML_FILE --kubeconfig $KUBECONFIG_FILE 
      Note that the INPUT_YAML_FILE is different from the YAML file in step 7, as it only includes the failed host. After the failed hosts have been deleted, modify the <your-input-file>.yaml from step 7 and then complete step again 7 to re-add the failed hosts.
  9. Go to the Expanding the Cluster section (below) and follow the steps to trigger the cluster expansion from the HPE Ezmeral Unified Analytics Software UI.

Expanding the Cluster

In a user-provided host configuration, the hosts within the pool (namespace) must have enough vCPUs and vGPUs for the cluster expansion to succeed. If you request more vCPUs and vGPUs than are available, the cluster expansion will fail.
If repeated attempts to expand the cluster fail with an "already complete" message, delete any existing EzkfOpsExpand custom resources on the workload cluster before you expand the cluster.
To identify the EzkfOpsExpand custom resources, run the following command:
kubectl get ezkfopsexpand -A 
# (lists the Expand CR names and namespaces)
For each of the EzkfOpsExpand custom resources listed in the output, run the following command:
kubectl delete ezkfopsexpand -n <expand_CR_namespace> <expand_CR_name>

To expand the cluster, complete the following steps:

  1. In the left navigation bar, select Administration > Settings.
  2. On the Cluster tab, select Expand Cluster.

  3. In the Expand Cluster drawer that opens, enter the following information:
    1. Number of additional vCPU to allocate. For example, if the current vCPU is 96 and you add 4 vCPU, the vCPU increases to a total of 100 vCPU.
    2. Select Use GPU if you want to use GPU and it is not already selected. If Use GPU was selected during installation of HPE Ezmeral Unified Analytics Software, this option cannot be disabled and stays selected by default.
    3. Indicate the additional number of vGPU to allocate.
    4. For GPU configuration, if a size was selected during HPE Ezmeral Unified Analytics Software installation, you cannot change the size. However, if no vGPU size was selected during installation, you can select a size now. For additional information, see GPU Support.
    5. If HA was selected during HPE Ezmeral Unified Analytics Software installation, you cannot disable it. If it was not selected during installation, you can select it now. Currently HA is available for the workload cluster only. You cannot set HA for the management cluster.
    6. Click Expand.

Configuring HPE MLDE for Added GPU Nodes

If you add GPU nodes to the cluster after installing HPE MLDE, you must perform additional steps to ensure HPE MLDE works on these nodes. For details, see Configuring HPE MLDE for Added GPU Nodes.