Enabling Insight Gathering in Production Mode

Describes the steps to enable insight gathering in production mode.

Prerequisites

The following prerequisites must be met before you can start insight gathering in production mode:
  • The cluster/fabric on which you wish to track user behavior has the insight service installed on all the nodes of the cluster or the desired nodes.

  • Hive Metastore must be installed with a production grade RDBMS like MySQL, Postgres, MariaDB. HPE recommends that you have the Hive Metastore running in high availability mode.
    NOTE
    The insight gathering stops when Hive Metastore is down and waits for the Hive Metastore service to be up and running, before the insight service can commit records to the respective Apache Iceberg table. When Hive Metastore is running in high availability mode, Data Fabric communicates any switchover of the Hive Metastore master to insight services running on the individual nodes.

About this task

Insight gathering can be enabled on few nodes, but the approach does not give a complete picture of the events taking place on the cluster/fabric.

HPE recommends that insight gathering is enabled on all nodes, when you wish to gather insights in production mode. In other words, insight gathering must be enabled at the global level in production mode.

The insight service automatically runs in production mode when Hive Metastore is configured with a production-grade RBDMS. The insight service picks the audit logs directly from the audit log files, and adds them to the respective Iceberg tables. Audit streaming is not required for insight gathering in production mode.

The audit files are committed to Apache Iceberg every five minutes, by default. The audit log entries from the audit files are put into a data file in a batch of 1024 records before pushing the data files to Apache Iceberg at the five-minute interval, by default.

NOTE
See insight to change the default interval and data file buffer size settings.

Insight gathering is more efficient and more scalable for file or production mode as files containing audit data are distributed on the global level.

Follow the steps given below to enable insight gathering in production mode.

Procedure

  1. Enable audit. See Enabling and Disabling Auditing of Cluster Administration to enable auditing for cluster administration.
  2. Configure a production-grade database for the Hive Metastore. See Using MySQL for the Hive Metastore to configure MySQL for Hive Metastore.
  3. Restart Hive Metastore after the database is changed.

Results

Insight gathering automatically begins in production mode after Hive Metastore successfully configured with a production-grade RDBMS.

The insight data is gathered on the following Apache Iceberg tables.

  • Data from cldb audit file is pushed to the cldb_is table.
  • Data from auth audit file is pushed to the auth_is table.
  • Data from mfs audit file is pushed to the mfs_is table.
  • Data from s3 audit file is pushed to the s3_is table.
TIP
Purging of Apache Iceberg table records happens periodically by one insight node. The purge operation is performed once every hour and the records older than 5 days are purged, by default. See insight to change the default purge frequency. The retention period for the Iceberg tables can be configured using the insight cluster command.