Kubernetes Cluster Creation Issues

This article contains troubleshooting steps related to Kubernetes cluster creation.

Symptom Logs to collect/Diagnostic steps
Kubernetes cluster creation Set-up log (from the UI) On the Controller: /var/log/bluedata/bds-mgmt.log On the Kubernetes master:
kubectl get events
                        journalctl (or /var/log/messages)
                        /var/log/bluedata/install/k8scluster_set-up-<time-stamp>
                        /var/log/bluedata/install/k8scluster_set-up-<time-stamp>.xtrace
For additional information, click here (link opens an external website in a new browser tab/window).
Kubernetes node failed to fetch join Example: Controller: /var/log/bluedata/bds-mgmt.log:
Feb  5 09:53:50 dl380-002 BDS: MGMT  :[  error][     src/k8s/bd_mgmt_api_k8s.erl:01122] <0.32028.17> exception reason: {k8s_cluster_creation,["6","failed to fetch join command"]}
It is very likely that the Controller and Worker hosts do not have network connectivity. Possible root causes:
  • Firewall
  • Mis-configured proxy setting
  • Cloud (AWS) blocking traffic
  • Router issue
On the Controller: /var/log/bluedata/bds-mgmt.log
Failed to execute on building K8s operator Example: Controller: /var/log/bluedata/bds-mgmt.log
Failed to exec: kubectl -n hpecp create -f /opt/bluedata/bundles/bluedata-HPE Ezmeral Runtime Enterprise-entdoc-minimal-debug-5.0-3002/scripts/iucomponents/k8s_cluster/operator-templates/config-crs/cr-hpecp-config.yaml ERROR: Failed executing 06_operators.sh SKIPPING rollback
Collect the Kubernetes events On Kubernetes Master node: kubectl get events
Error message:
Post https://hpecp-validator.hpecp.svc:443/validate?timeout=30s: Service Unavailable

Example:

kubectl -n hpecp create -f /opt/bluedata/bundles/bluedata-epic-entdoc-minimal-debug-5.0-3002/scripts/iucomponents/k8s_cluster/operator-templates/config-crs/cr-hpecp-config.yaml
....Error from server (InternalError): error when creating "/opt/bluedata/bundles/bluedata-epic-entdoc-minimal-debug-5.0-3002/scripts/iucomponents/k8s_cluster/operator-templates/config-crs/cr-hpecp-config.yaml": Internal error occurred: failed calling webhook "hard-validate.hpecp.hpe.com": Post https://hpecp-validator.hpecp.svc:443/validate?timeout=30s: Service Unavailable

It is likely there is a network problem between the Controller host and the Kubernetes hosts.

Work with Network IT to verify that there are no connectivity issue between these hosts.