Upgrading and Patching Data Fabric Clusters on Kubernetes
Many aspects of an HPE Ezmeral Data Fabric cluster on Kubernetes can be reconfigured while the cluster is running. For example:
- You can change pod settings such as CPU, memory, and storage.
- You can change the number of pods by changing the
countvalue in the CR section that describes that pod. For example, to increase the number of MFS pods, increase the value ofcountin themfssection of the CR. - You can upgrade a pod container image, typically to a new container image version,
by changing the
imagevalue in the pod CR.
See Update Parameters for a list of the parameters that can be updated on a per-component basis.
Upgrading and Patching Procedure
Upgrades and patching changes are implemented as rolling updates, depending on the object workload type. For detailed information about the process for each workload type, see Online Update Behaviors.
edf update feature to increase the
number of ZooKeeper pods can cause the CLDB pods to become temporarily unavailable.
Only upgrade the ZooKeeper pods when cluster downtime can be tolerated. To upgrade or patch any component:
- If needed, perform a bootstrap upgrade to ensure that the cluster has the latest operator components. See Running a Bootstrap Upgrade.
- Edit the CR for the component that you want to update. See Update Parameters.NOTEyou can download the CR file using following commands:
kubectl get dataplatform [name] -o yaml
[name]-cr.yaml
- Apply the changes using the
kubectl applycommand. - Either:
- CLDB or ZooKeeper only: Proceed to the next step.
- All others: This completes the upgrade/patch process.
-
Log into the
adminclipod, and then execute the following command:edf update clusterFor example:
kubectl exec -it admincli-0 -n mycluster -- /bin/bash edf update cluster - Verify that the pods are ready by executing the
edf report readycommand. The command can take a couple of minutes to execute. You might notice a delay between the display of the second and the third lines of the output.edf report ready 2021/06/14 23:28:01 [edf reports]: [INFO] Checking if pods are stabilized for upgrade. This may take a minute or two. 2021/06/14 23:28:02 [edf reports]: [INFO] Valid MapR user ticket found, skipping ticket generation 2021/06/14 23:29:52 [edf reports]: [INFO] Pods are ready
Scaling Up ZK, MFS, and CLDB
To scale up ZK, MFS, and CLDB objects:
- In the Data Fabric CR, change the ZK, MFS, or
CLDB
failurecountparameter, and then execute thekubectl applycommand to apply the changes. See Zookeeper Core Object Settings, MFS Core Object Settings, and CLDB Core Object Settings. -
Wait for the new pods to start up and be ready. You can verify readiness (
1/1 READY) by executing the following command:kubectl get pods -n <cluster-name> -
Create a new
adminclipod by deleting the current pod:kubectl delete pod admincli-0 -n <cluster-name> -
Wait for the new
adminclipod to be1/1 READY, then exec into that pod and execute theedf update clustercommand. This step refreshes the existing pods and makes them aware of the new pods. For example:kubectl exec -it admincli-0 -n <cluster-name> -- /bin/bash edf update cluster
Verifying the Upgrade Changes
Check the status of upgraded pods and parameters after applying an upgrade, patch, or configuration change:
- Execute the
edf report readycommand to ensure that the Data Fabric control plane pods are ready.edf report ready -
Execute the
get podscommand to check the status of individual pods:kubectl get pods -n mycluster -w -
Check parameter values by executing the
describe podcommand. For example, if you updated the image tag for podmcs-0:kubectl describe pod mcs-0 -n mycluster | grep -i image:
Updatable Parameters
This sections lists all of the parameters that can be updated on a per-component basis:
- All components
baseimagetagimageregistry
adminclicountimagelimitcpulimitdisklimitmemorylogLevelrequestcpurequestdiskrequestmemory
cldbfailurecountimagelimitcpulimitdisklimitmemorylogLevelrequestcpurequestdiskrequestmemory
mfsimagegroups: countlimitcpulimitdisklimitmemorylogLevelrequestcpurequestdiskrequestmemory
webservercountimagelimitcpulimitdisklimitmemorylogLevelrequestcpurequestdiskrequestmemory
zookeeperfailurecountimagelimitcpulimitdisklimitmemoryloglevelrequestcpurequestdiskrequestmemory
hivemetastorecountimagelimitcpulimitdisklimitmemoryloglevelrequestcpurequestdiskrequestmemory
objectstorecountimagehostports: limitcpuhostports: limitdiskhostports: limitmemoryhostports: loglevelhostports: requestcpuhostports: requestdiskhostports: requestmemory
collectdimagelimitcpulimitdisklimitmemorylogLevelrequestcpurequestdiskrequestmemory
grafanacountimagelimitcpulimitdisklimitmemorylogLevelrequestcpurequestdiskrequestmemory
opentsdbcountimagelimitcpulimitdisklimitmemorylogLevelrequestcpurequestdiskrequestmemory
Online Update Behaviors
Upgrades and patching changes are implemented as rolling updates, depending on the object workload type.
- Deployment - Grafana, Kibana, Collectd: The DataPlatform operator sees the change and launches a new pod on the cluster (it can be on the same node or another). This new pod contains the changes specified in the updated CR. After the new pod is ready, a "pre-stop" script gracefully shuts down the processes, and then the existing pod is terminated. This process repeats until all pods of a deployment are updated.
-
StatefulSet -The DataPlatform operator brings down a pod of a specific StatefulSet. A “pre-stop” script ensures processes are gracefully shut down before the pod is terminated. A new pod that has the updated configuration changes is then brought up on the same physical node. The operator waits until this new pod is ready, and then repeats until all pods are updated.
Because the core data pods, such as CLDB and ZK perform critical functions, the operator does not update the pods after after CR changes are applied. You must execute a command (see Upgrading and Patching Procedure) to complete the process.
Examples of StatefulSet pods include the following:
- CLDB
- ZooKeeper
- MCS
- MFS
- admincli
- Object Store
- NFS Server
- Elasticsearch
- OpenTSDB
- Hive Metastore
- Gateway pods, such as MapR Gateway, HTTPS Gateway, Data Access Gateway, and Kafka REST Gateway
- DaemonSet - Fluentd: All running pods are brought down simultaneously, and new pods are then started up in parallel. Parallel updates are not common to all DaemonSets, but are appropriate for Fluentd. A “pre-stop” script ensures processes are gracefully shut down before the pod is terminated.