Upgrading and Patching Data Fabric Clusters on Kubernetes
Many aspects of an HPE Ezmeral Data Fabric cluster on Kubernetes can be reconfigured while the cluster is running. For example:
- You can change pod settings such as CPU, memory, and storage.
- You can change the number of pods by changing the
count
value in the CR section that describes that pod. For example, to increase the number of MFS pods, increase the value ofcount
in themfs
section of the CR. - You can upgrade a pod container image, typically to a new container image version,
by changing the
image
value in the pod CR.
See Update Parameters for a list of the parameters that can be updated on a per-component basis.
Upgrading and Patching Procedure
Upgrades and patching changes are implemented as rolling updates, depending on the object workload type. For detailed information about the process for each workload type, see Online Update Behaviors.
edf update
feature to increase the
number of ZooKeeper pods can cause the CLDB pods to become temporarily unavailable.
Only upgrade the ZooKeeper pods when cluster downtime can be tolerated. To upgrade or patch any component:
- If needed, perform a bootstrap upgrade to ensure that the cluster has the latest operator components. See Running a Bootstrap Upgrade.
- Edit the CR for the component that you want to update. See Update Parameters.NOTEyou can download the CR file using following commands:
kubectl get dataplatform [name] -o yaml
[name]-cr.yaml
- Apply the changes using the
kubectl apply
command. - Either:
- CLDB or ZooKeeper only: Proceed to the next step.
- All others: This completes the upgrade/patch process.
-
Log into the
admincli
pod, and then execute the following command:edf update cluster
For example:
kubectl exec -it admincli-0 -n mycluster -- /bin/bash edf update cluster
- Verify that the pods are ready by executing the
edf report ready
command. The command can take a couple of minutes to execute. You might notice a delay between the display of the second and the third lines of the output.edf report ready 2021/06/14 23:28:01 [edf reports]: [INFO] Checking if pods are stabilized for upgrade. This may take a minute or two. 2021/06/14 23:28:02 [edf reports]: [INFO] Valid MapR user ticket found, skipping ticket generation 2021/06/14 23:29:52 [edf reports]: [INFO] Pods are ready
Scaling Up ZK, MFS, and CLDB
To scale up ZK, MFS, and CLDB objects:
- In the Data Fabric CR, change the ZK, MFS, or
CLDB
failurecount
parameter, and then execute thekubectl apply
command to apply the changes. See Zookeeper Core Object Settings, MFS Core Object Settings, and CLDB Core Object Settings. -
Wait for the new pods to start up and be ready. You can verify readiness (
1/1 READY
) by executing the following command:kubectl get pods -n <cluster-name>
-
Create a new
admincli
pod by deleting the current pod:kubectl delete pod admincli-0 -n <cluster-name>
-
Wait for the new
admincli
pod to be1/1 READY
, then exec into that pod and execute theedf update cluster
command. This step refreshes the existing pods and makes them aware of the new pods. For example:kubectl exec -it admincli-0 -n <cluster-name> -- /bin/bash edf update cluster
Verifying the Upgrade Changes
Check the status of upgraded pods and parameters after applying an upgrade, patch, or configuration change:
- Execute the
edf report ready
command to ensure that the Data Fabric control plane pods are ready.edf report ready
-
Execute the
get pods
command to check the status of individual pods:kubectl get pods -n mycluster -w
-
Check parameter values by executing the
describe pod
command. For example, if you updated the image tag for podmcs-0
:kubectl describe pod mcs-0 -n mycluster | grep -i image:
Updatable Parameters
This sections lists all of the parameters that can be updated on a per-component basis:
- All components
baseimagetag
imageregistry
admincli
count
image
limitcpu
limitdisk
limitmemory
logLevel
requestcpu
requestdisk
requestmemory
cldb
failurecount
image
limitcpu
limitdisk
limitmemory
logLevel
requestcpu
requestdisk
requestmemory
mfs
image
groups: count
limitcpu
limitdisk
limitmemory
logLevel
requestcpu
requestdisk
requestmemory
webserver
count
image
limitcpu
limitdisk
limitmemory
logLevel
requestcpu
requestdisk
requestmemory
zookeeper
failurecount
image
limitcpu
limitdisk
limitmemory
loglevel
requestcpu
requestdisk
requestmemory
hivemetastore
count
image
limitcpu
limitdisk
limitmemory
loglevel
requestcpu
requestdisk
requestmemory
objectstore
count
image
hostports: limitcpu
hostports: limitdisk
hostports: limitmemory
hostports: loglevel
hostports: requestcpu
hostports: requestdisk
hostports: requestmemory
collectd
image
limitcpu
limitdisk
limitmemory
logLevel
requestcpu
requestdisk
requestmemory
grafana
count
image
limitcpu
limitdisk
limitmemory
logLevel
requestcpu
requestdisk
requestmemory
opentsdb
count
image
limitcpu
limitdisk
limitmemory
logLevel
requestcpu
requestdisk
requestmemory
Online Update Behaviors
Upgrades and patching changes are implemented as rolling updates, depending on the object workload type.
- Deployment - Grafana, Kibana, Collectd: The DataPlatform operator sees the change and launches a new pod on the cluster (it can be on the same node or another). This new pod contains the changes specified in the updated CR. After the new pod is ready, a "pre-stop" script gracefully shuts down the processes, and then the existing pod is terminated. This process repeats until all pods of a deployment are updated.
-
StatefulSet -The DataPlatform operator brings down a pod of a specific StatefulSet. A “pre-stop” script ensures processes are gracefully shut down before the pod is terminated. A new pod that has the updated configuration changes is then brought up on the same physical node. The operator waits until this new pod is ready, and then repeats until all pods are updated.
Because the core data pods, such as CLDB and ZK perform critical functions, the operator does not update the pods after after CR changes are applied. You must execute a command (see Upgrading and Patching Procedure) to complete the process.
Examples of StatefulSet pods include the following:
- CLDB
- ZooKeeper
- MCS
- MFS
- admincli
- Object Store
- NFS Server
- Elasticsearch
- OpenTSDB
- Hive Metastore
- Gateway pods, such as MapR Gateway, HTTPS Gateway, Data Access Gateway, and Kafka REST Gateway
- DaemonSet - Fluentd: All running pods are brought down simultaneously, and new pods are then started up in parallel. Parallel updates are not common to all DaemonSets, but are appropriate for Fluentd. A “pre-stop” script ensures processes are gracefully shut down before the pod is terminated.