Upgrading and Patching Data Fabric Clusters on Kubernetes

NOTE
In this article, the term tenant refers to Data Fabric tenants (formerly '"MapR tenants") and not to Kubernetes tenants unless explicitly noted otherwise on a case-by-case basis.

Many aspects of an HPE Ezmeral Data Fabric cluster on Kubernetes can be reconfigured while the cluster is running. For example:

  • You can change pod settings such as CPU, memory, and storage.
  • You can change the number of pods by changing the count value in the CR section that describes that pod. For example, to increase the number of MFS pods, increase the value of count in the mfs section of the CR.
  • You can upgrade a pod container image, typically to a new container image version, by changing the image value in the pod CR.

See Update Parameters for a list of the parameters that can be updated on a per-component basis.

Upgrading and Patching Procedure

Upgrades and patching changes are implemented as rolling updates, depending on the object workload type. For detailed information about the process for each workload type, see Online Update Behaviors.

CAUTION
Using the edf update feature to increase the number of ZooKeeper pods can cause the CLDB pods to become temporarily unavailable. Only upgrade the ZooKeeper pods when cluster downtime can be tolerated.
NOTE
If needed, see Scaling Up ZK, MFS, and CLDB.

To upgrade or patch any component:

  1. If needed, perform a bootstrap upgrade to ensure that the cluster has the latest operator components. See Running a Bootstrap Upgrade.
  2. Edit the CR for the component that you want to update. See Update Parameters
    NOTE
    you can download the CR file using following commands:
    kubectl get dataplatform [name] -o yaml
    [name]-cr.yaml
    .
  3. Apply the changes using the kubectl apply command.
  4. Either:
    • CLDB or ZooKeeper only: Proceed to the next step.
    • All others: This completes the upgrade/patch process.
  5. Log into the admincli pod, and then execute the following command:

    edf update cluster

    For example:

    kubectl exec -it admincli-0 -n mycluster -- /bin/bash
    edf update cluster
  6. Verify that the pods are ready by executing the edf report ready command. The command can take a couple of minutes to execute. You might notice a delay between the display of the second and the third lines of the output.
    edf report ready
    2021/06/14 23:28:01 [edf reports]: [INFO] Checking if pods are stabilized for upgrade. This may take a minute or two.
    2021/06/14 23:28:02 [edf reports]: [INFO] Valid MapR user ticket found, skipping ticket generation
    2021/06/14 23:29:52 [edf reports]: [INFO] Pods are ready

Scaling Up ZK, MFS, and CLDB

NOTE
These objects cannot be scaled down.

To scale up ZK, MFS, and CLDB objects:

  1. In the Data Fabric CR, change the ZK, MFS, or CLDB failurecount parameter, and then execute the kubectl apply command to apply the changes. See Zookeeper Core Object Settings, MFS Core Object Settings, and CLDB Core Object Settings.
  2. Wait for the new pods to start up and be ready. You can verify readiness (1/1 READY) by executing the following command:

    kubectl get pods -n <cluster-name>
  3. Create a new admincli pod by deleting the current pod:

    kubectl delete pod admincli-0 -n <cluster-name>
  4. Wait for the new admincli pod to be 1/1 READY, then exec into that pod and execute the edf update cluster command. This step refreshes the existing pods and makes them aware of the new pods. For example:

    kubectl exec -it admincli-0 -n <cluster-name> -- /bin/bash
    edf update cluster

Verifying the Upgrade Changes

Check the status of upgraded pods and parameters after applying an upgrade, patch, or configuration change:

  • Execute the edf report ready command to ensure that the Data Fabric control plane pods are ready.
    edf report ready
  • Execute the get pods command to check the status of individual pods:

    kubectl get pods -n mycluster -w
  • Check parameter values by executing the describe pod command. For example, if you updated the image tag for pod mcs-0:

    kubectl describe pod mcs-0 -n mycluster | grep -i image:

Updatable Parameters

This sections lists all of the parameters that can be updated on a per-component basis:

  • All components
    • baseimagetag
    • imageregistry
  • admincli
    • count
    • image
    • limitcpu
    • limitdisk
    • limitmemory
    • logLevel
    • requestcpu
    • requestdisk
    • requestmemory
  • cldb
    • failurecount
    • image
    • limitcpu
    • limitdisk
    • limitmemory
    • logLevel
    • requestcpu
    • requestdisk
    • requestmemory
  • mfs
    • image
    • groups: count
    • limitcpu
    • limitdisk
    • limitmemory
    • logLevel
    • requestcpu
    • requestdisk
    • requestmemory
  • webserver
    • count
    • image
    • limitcpu
    • limitdisk
    • limitmemory
    • logLevel
    • requestcpu
    • requestdisk
    • requestmemory
  • zookeeper
    • failurecount
    • image
    • limitcpu
    • limitdisk
    • limitmemory
    • loglevel
    • requestcpu
    • requestdisk
    • requestmemory
  • hivemetastore
    • count
    • image
    • limitcpu
    • limitdisk
    • limitmemory
    • loglevel
    • requestcpu
    • requestdisk
    • requestmemory
  • objectstore
    • count
    • image
    • hostports: limitcpu
    • hostports: limitdisk
    • hostports: limitmemory
    • hostports: loglevel
    • hostports: requestcpu
    • hostports: requestdisk
    • hostports: requestmemory
  • collectd
    • image
    • limitcpu
    • limitdisk
    • limitmemory
    • logLevel
    • requestcpu
    • requestdisk
    • requestmemory
  • grafana
    • count
    • image
    • limitcpu
    • limitdisk
    • limitmemory
    • logLevel
    • requestcpu
    • requestdisk
    • requestmemory
  • opentsdb
    • count
    • image
    • limitcpu
    • limitdisk
    • limitmemory
    • logLevel
    • requestcpu
    • requestdisk
    • requestmemory

Online Update Behaviors

Upgrades and patching changes are implemented as rolling updates, depending on the object workload type.

  • Deployment - Grafana, Kibana, Collectd: The DataPlatform operator sees the change and launches a new pod on the cluster (it can be on the same node or another). This new pod contains the changes specified in the updated CR. After the new pod is ready, a "pre-stop" script gracefully shuts down the processes, and then the existing pod is terminated. This process repeats until all pods of a deployment are updated.
  • StatefulSet -The DataPlatform operator brings down a pod of a specific StatefulSet. A “pre-stop” script ensures processes are gracefully shut down before the pod is terminated. A new pod that has the updated configuration changes is then brought up on the same physical node. The operator waits until this new pod is ready, and then repeats until all pods are updated.

    Because the core data pods, such as CLDB and ZK perform critical functions, the operator does not update the pods after after CR changes are applied. You must execute a command (see Upgrading and Patching Procedure) to complete the process.

    Examples of StatefulSet pods include the following:

    • CLDB
    • ZooKeeper
    • MCS
    • MFS
    • admincli
    • Object Store
    • NFS Server
    • Elasticsearch
    • OpenTSDB
    • Hive Metastore
    • Gateway pods, such as MapR Gateway, HTTPS Gateway, Data Access Gateway, and Kafka REST Gateway
  • DaemonSet - Fluentd: All running pods are brought down simultaneously, and new pods are then started up in parallel. Parallel updates are not common to all DaemonSets, but are appropriate for Fluentd. A “pre-stop” script ensures processes are gracefully shut down before the pod is terminated.