Troubleshooting Services
The Services tabs of the Platform Administrator and Kubernetes Administrator Dashboard screens display the status of each service. The Platform Administrator Dashboard screen also includes general HPE Ezmeral Runtime Enterprise services, such as monitoring.
See Dashboard - Kubernetes Administrator and Dashboard - Platform Administrator.
A service that is one of the following degraded states may require troubleshooting, corrective action, or both:
- Warning: Yellow
- Critical: Red.
Audit
This service audits all user access to the platform interface, and specifically CRUD operations on clusters, but does not audit requests sent to specific Kubernetes clusters. This service runs on the Controller only.
/var/log/bluedata/bds-audit.log
This log file provides a comprehensive history of all interface-level user actions
and is a subset of the bd-mgmt.log
. Contact Hewlett Packard Enterprise Support if you require assistance to
resolve an issue with this service.
Caching Node
This service is a critical component for running Big Data jobs against the tenant storage, external DataTaps, or both. I/O pressure, memory issues, or incompatibility with a remote DataTap can cause issues.
dtap
. When
troubleshooting the caching node, you can use the standard kubectl logs commands.
For example, to output the caching node log of mypod
in
mynamespace
, enter the following
command:kubectl logs -f -n mynamespace mypod -c dtap
If this service continues to restart, or if it remains in a critical state, then contact Hewlett Packard Enterprise Support.
HA Engine
This service runs the HA process for the platform. If the status of this service is Critical, then contact Hewlett Packard Enterprise Support.
HA Engine logs are stored in /var/log/bluedata/pl_ha/
and /var/log/pacemaker
.
HA Proxy
This service runs on the Gateway hosts in the platform and is managed by the
platform. If this service becomes Critical (red dot), then
collect /var/log/bludata/bds-mgmt.log
and
/var/log/messages
on the affected Gateway host, and then
contact Hewlett Packard Enterprise Support.
Management
This service is a key component that manages the overall system, including:
- The physical hosts
- Submitting jobs
- The UI and RESTful APIs.
If this service is in a degraded state, then the web interface will not be accessible. You can access the Nagios interface directly by navigating to:
http://<controller-ip-address>:8085/nagios
The management service can fail for a variety of reasons, including:
- Low availability of resources on the Controller host.
- Disk failure on the root volume of the Controller node.
To restart this service, execute the following commands:
stop bds-controller
start bds-controller
If this service still fails, then contact Hewlett Packard Enterprise Support.
The /var/log/bluedata/bds-mgmt.log
file contains detailed
interface-based operations, including:
- CRUD of various objects such as tenants, DataTaps, clusters, and flavors.
- Errors related to cluster creation failures, network connectivity issues between containers.
- Other related items.
Restarting Services
After restarting the monitoring container, services might fail to start.
To restart the management service, see Management.
To restart gateway services, see Restarting Gateway Services.
docker exec <id-of-dontainer-running-the-monitoring-image> service metricbeat restart