Kubernetes Web Interface Issues

This article contains troubleshooting steps related to the Kubernetes web interface.

Symptom Logs to collect/Diagnostic steps

Web interface hangs.

The browser may be present with various errors. For example:

  • “Internal Server Error. The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator at root@localhost to inform them of the time this error occurred, and the actions you performed just before this error.”
  • The browser is getting constant “attempting to connect” message, but not making any connection.
  • A “Service Unavailable” error message appears on the screen.

If platform High Availability is enabled, verify that you are attempting to access the correct Controller host via either the cluster IP address or the IP address of a Gateway host.

It is possible that there was a failover and that you are trying to connect to the wrong Controller host.

Validate that the same problem occurs from command line. Verify that the API service is running.

On the Controller host, execute the following command:

# curl -k GET https://localhost:8080

or

# curl -k GET https://<controller-IP>:8080

Expect to receive a "Could not resolve host" message.

# curl -k GET  https://localhost:8080
curl: (6) Could not resolve host: GET; Unknown error
If this command works, then the management server is working properly, and the problem is in either the web browser or the connection to the Controller host.
# curl https://localhost:8080/
curl:  (7) Failed to connect to ::1: No route to host
It is possible that nothing is listening on port 8080. To verify:
# netstat -nlp | grep 8080

Double-check that the HPE Ezmeral Runtime Enterprise Controller host is running.

On the Controller host, verify that the HPE Ezmeral Runtime Enterprise Controller service is up.

# systemctl status bds-controller
# systemctl status bds-worker
If it is down, then you need to start it up. See Manually Restarting Services. If the bds-controller service is enabled and active, then proceed to the next step. Verify that the HPE Ezmeral Runtime Enterprise management service is responding. Run a basic CLI command to verify that the management service is active and responding.
# bdconfig --getallenv
Check if the Apache Server has encountered an issue. Look for obvious issues in the following files on the Controller host:
/var/log/httpd/error_log
/var/log/httpd/access_log
You may need to search online for solutions based on any significant errors. If you suspect this is an Apache httpd transient problem or need to reproduce an Apache error, then check the httpd status.
# service httpd status
Consider restarting httpd:
# /bin/systemctl restart httpd.service
If there is no obvious sign of an Apache httpd server error, then proceed to next step. If the web interface was configured to use SSL connections, then the problem may be due to SSL error(s). Look for any error in the following logs:
/var/log/httpd/ssl_access_log
/var/log/httpd/ssl_error_log
Look for a possible SSL error or RSA certificate ID mismatch error. For example:
[Sun Aug 05 07:25:52.488606 2018] [ssl:warn] [pid 3081] AH01909: RSA certificate configured for machine.bluedata.com:443 does NOT include an ID which matches the server name
Check if the certificate has changed and/or needs additional credential (e.g. PassPhrase). You may need to search online for solutions based on the errors listed in the ssl_error_log and ssl_access_log files. If there is no obvious sign of an SSL-related problem, then proceed to the next step. If SELinux is enabled, it is possible that SELinux is blocking Apache from loading content. It is possible that the SELinux setting has changed when HPE Ezmeral Runtime Enterprise was installed. To check if this is the case, temporarily set SELinux to permissive mode.
# getenforce
# setenforce 0              
# systemctl restart httpd
Now try to access the web interface. If that works, then consult your IT department to configure the SELinux policy to allow the web interface to allow proper access. Click here for guidance (link opens an external website in a new browser tab/window). If this is not an SELinux policy issue, then proceed to the next step. Check if there is erlang web. Either clear the cache or try another browser. There is a known issue where a cookie value got the Django (web framework) to hang. Check to see whether “Service Unavailable” or “WSGI” error is caused by an unreadable Apache runtime directory.

Click here for more information (link opens an external website in a new browser tab/window).

Make sure that the ownership of the folder /run/httpd has proper permission and ownership (apache) :

For example, if you see this:

drwx--x---. 2 root root 100 Aug 1 10:33 httpd
Then you need to change permission/ownership to apache:
drwx--x---. 2 apache apache 100 Aug 1 10:33 httpd
Make sure the directory /etc/httpd/run has permission:
lrwxrwxrwx. 1 root root 10 Apr 3 06:23 run -> /run/httpd
Try the following fix:
  1. Open the following file: /etc/httpd/conf.d/bdswebui.conf
  2. Look for line: WSGISocketPrefix run/wsgi
  3. Change to this: WSGISocketPrefix /var/run/wsgi
  4. Restart httpd: systemctl restart httpd

HPE Ezmeral Runtime Enterprise invokes erlang mochiweb internally to handle web service. It is possible that this thread is hanging.

On the Controller host, see /opt/bluedata/common-install/bd_mgmt/log/erlang.log.*

Look for a mochiweb_socket_server error in the log. For example:

=ERROR REPORT==== 17-Oct-2018::22:46:02 ===
                            application: mochiweb
                            "Accept failed error"
                            "{error,{tls_alert,\"record overflow\"}}"
                         
                        =ERROR REPORT==== 17-Oct-2018::22:46:02 ===
                        {mochiweb_socket_server,320,{acceptor_error,{error,accept_failed}}}
HPE Ezmeral Runtime Enterprise uses the mochiweb acceptor processes. If the acceptor process is either blocked or has died, then the web interface will not respond. If this error exists, then restart the web service by executing the following command on the Controller:
cd /opt/bluedata/bundles/bluedata-*/scripts
                        source common/constants.sh
                        source common/utils.sh
                        `get_rpc_cmd` bd_mgmt_web stop ""
Check if the erlang engine is running in the BDS management service. On the Controller, execute the following command:
# /opt/bluedata/common-install/bd_mgmt/bin/bd_mgmt ping
If you receive a pong reply, then the erlang service is running. If you do not receive a pong reply, then the erlang engine is down and it needs to be restarted. To restart it, you must restart the bds-controller service In bds-mgmt.log, check to see whether the Controller is getting an RPC to fetch Network Params failed error:
Jan 30 19:53:43 yav-369 BDS: MGMT: [info3][ src/bd_hypervisor_agent_docker_:00254] <0.1076.0> RPC to fetch Network Params failed. Retrying after 15 seconds
Jan 30 19:53:58 yav-369 BDS: MGMT: [info3][ src/bd_hypervisor_agent_docker_:00254] <0.1076.0> RPC to fetch Network Params failed. Retrying after 15 seconds
Jan 30 19:54:13 yav-369 BDS: MGMT: [info3][ src/bd_hypervisor_agent_docker_:00254] <0.1076.0> RPC to fetch Network Params failed. Retrying after 15 seconds
Jan 30 19:54:28 yav-369 BDS: MGMT: [info3][ src/bd_hypervisor_agent_docker_:00254] <0.1076.0> RPC to fetch Network Params failed. Retrying after 15 seconds
If this is the case, then there is likely either a network problem, or the cluster IP address is not up. hypervisor: /var/log/bluedata/pl_ha/log.0 ha_info Run systemctl status network-target.online to get the network status.
# systemctl status network
network.service - LSB: Bring up/down networking
                                       Loaded: loaded (/etc/rc.d/init.d/network; bad;
                                    vendor preset: disabled)    Active: active
                                    (running) since Wed 2019-01-30 18:49:50 PST; 1h 10min ago
                                         Docs: man:systemd-sysv-generator(8)
                                      Process: 57141 ExecStart=/etc/rc.d/init.d/network start
                                    (code=exited, status=0/SUCCESS)     Tasks: 1
                                       Memory: 3.9M    CGroup:
                                    /system.slice/network.service      
                                         └─57383 /sbin/dhclient -1 -q -lf
                                    /var/lib/dhclient/dhclient-d3899115-52fe-4d49-acc6-0ed2ed314ff5-eth0.lease
                                    -pf /var/run/dhclient-eth0.pid -H yav-369 eth0   Jan 30
                                    18:49:50 yav-369.lab.bluedata.com network[57141]: RTNETLINK
                                    answers: File exists Jan 30 18:49:50 yav-369.lab.bluedata.com
                                    network[57141]: RTNETLINK answers: File exists Jan 30 18:49:50
                                    yav-369.lab.bluedata.com network[57141]: RTNETLINK answers: File
                                    exists Jan 30 18:49:50 yav-369.lab.bluedata.com network[57141]:
                                    RTNETLINK answers: File exists Jan 30 18:49:50
                                    yav-369.lab.bluedata.com network[57141]: RTNETLINK answers: File
                                    exists Jan 30 18:49:50 yav-369.lab.bluedata.com network[57141]:
                                    RTNETLINK answers: File exists Jan 30 18:49:50
                                    yav-369.lab.bluedata.com network[57141]: RTNETLINK answers: File
                                    exists Jan 30 18:49:50 yav-369.lab.bluedata.com network[57141]:
                                    RTNETLINK answers: File exists Jan 30 18:49:50
                                    yav-369.lab.bluedata.com network[57141]: RTNETLINK answers: File
                                    exists Jan 30 18:49:50 yav-369.lab.bluedata.com systemd[1]:
                                    Started LSB: Bring up/down networking.

Restart the network and see if the problem goes away.

If none of the above steps is able to resolve the problem, then contact HPE Technical Support.

Unable to download the Kubectl plug-in from the Kubnernetes Dashboard screens. You may be using an unsupported browser. See Browser Requirements.
General error or hang in the UI.

Collect Apache logs

On the Controller:

/var/log/httpd/error_log
/var/log/access_log
Collect diagnostic data from browser Turn on Developer Mode. (On the Chrome browser, right click, and then select Inspect.)
  • Select the Console tab, click the Settings icon (gear), and then check the Preserve log check box.

  • Repeat this for the Network tab.
  • Reproduce the UI problem, and then examine the debugging details.