Enabling Insight Gathering in Production Mode
Describes the steps to enable insight gathering in production mode.
Prerequisites
-
The cluster/fabric on which you wish to track user behavior has the insight service installed on all the nodes of the cluster or the desired nodes.
- Hive Metastore must be installed with a production grade RDBMS like MySQL, Postgres, MariaDB. HPE recommends that you have the Hive Metastore running in high availability mode.
About this task
Insight gathering can be enabled on few nodes, but the approach does not give a complete picture of the events taking place on the cluster/fabric.
HPE recommends that insight gathering is enabled on all nodes, when you wish to gather insights in production mode. In other words, insight gathering must be enabled at the global level in production mode.
The insight service automatically runs in production mode when Hive Metastore is configured with a production-grade RBDMS. The insight service picks the audit logs directly from the audit log files, and adds them to the respective Iceberg tables. Audit streaming is not required for insight gathering in production mode.
Insight gathering is more efficient and more scalable for file or production mode as files containing audit data are distributed on the global level.
Follow the steps given below to enable insight gathering in production mode.
Procedure
- Enable audit. See Enabling and Disabling Auditing of Cluster Administration to enable auditing for cluster administration.
- Configure a production-grade database for the Hive Metastore. Restart Hive Metastore after the database is changed.
- Enable insight. See insight cluster to enable insight.
Results
The insight data is gathered on the following Apache Iceberg tables.
- Data from cldb audit file is pushed to the
cldb_is
table. - Data from auth audit file is pushed to the
auth_is
table. - Data from mfs audit file is pushed to the
mfs_is
table. - Data from s3 audit file is pushed to the
s3_is
table.