Enabling Insight Gathering in Production Mode
Describes the steps to enable insight gathering in production mode.
Prerequisites
-
The cluster/fabric on which you wish to track user behavior has the insight service installed on all the nodes of the cluster or the desired nodes.
- Hive Metastore must be installed with a production grade RDBMS like MySQL,
Postgres, MariaDB. HPE recommends that you have the Hive Metastore running in
high availability mode. NOTEThe insight gathering stops when Hive Metastore is down and waits for the Hive Metastore service to be up and running, before the insight service can commit records to the respective Apache Iceberg table. When Hive Metastore is running in high availability mode, Data Fabric communicates any switchover of the Hive Metastore master to insight services running on the individual nodes.
About this task
Insight gathering can be enabled on few nodes, but the approach does not give a complete picture of the events taking place on the cluster/fabric.
HPE recommends that insight gathering is enabled on all nodes, when you wish to gather insights in production mode. In other words, insight gathering must be enabled at the global level in production mode.
The insight service automatically runs in production mode when Hive Metastore is configured with a production-grade RBDMS. The insight service picks the audit logs directly from the audit log files, and adds them to the respective Iceberg tables. Audit streaming is not required for insight gathering in production mode.
The audit files are committed to Apache Iceberg every five minutes, by default. The audit log entries from the audit files are put into a data file in a batch of 1024 records before pushing the data files to Apache Iceberg at the five-minute interval, by default.
Insight gathering is more efficient and more scalable for file or production mode as files containing audit data are distributed on the global level.
Follow the steps given below to enable insight gathering in production mode.
Procedure
- Enable audit. See Enabling and Disabling Auditing of Cluster Administration to enable auditing for cluster administration.
- Configure a production-grade database for the Hive Metastore. See Using MySQL for the Hive Metastore to configure MySQL for Hive Metastore.
- Restart Hive Metastore after the database is changed.
Results
The insight data is gathered on the following Apache Iceberg tables.
- Data from cldb audit file is pushed to the
cldb_is
table. - Data from auth audit file is pushed to the
auth_is
table. - Data from mfs audit file is pushed to the
mfs_is
table. - Data from s3 audit file is pushed to the
s3_is
table.