Configuring Data Fabric to Track User Behavior

Describes how to configure Data Fabric to be able to track user behavior.

When auditing is enabled in Data Fabric, files, streams and tables can be audited for cluster administration and/or data access operations.

Data Fabric audit logs provide insights into the activity that has taken place in relation to the cluster.

Auditing is useful to record user behavior and assists in tracking anomalies or potential data security threats with respect to Data Fabric.

Data Fabric stores audit logs in files and the audit logs can be directed to streams. However, it was not possible to run queries on streams in the earlier versions on Data Fabric.

Data Fabric provides a utility by the name, update_insights.sh, to copy audit logs onto Apache Iceberg (Iceberg), so that the data that is copied or added to Iceberg tables can be queried.

TIP HPE recommends that you run the expandaudit utility before updating Iceberg. This is because there can be different FIDs that belong to the same file. Running expandaudit ensures that the filename is the same for different audit log entries that refer to different fids of a given file. The expandaudit utility makes the audit log contents more user-friendly by replacing ids with names.

You can use tools like Spark and Zeppelin to run queries on the Iceberg tables to generate various reports and charts required by you to detect any anomalies in user behavior related to the data access operations and cluster administration.

Iceberg requires Hive metastore to store and manage the Iceberg catalog. Hive must be accessible to Iceberg for proper working of Iceberg.

Hive metastore requires a relational database management system like MySQL in production setups. See Using MySQL for the Hive Metastore to use MySQL with Hive metastore.

To set up MySQL to work with the Hive metastore and Data Fabric, see Configuring a Remote MySQL Database for Hive Metastore.

You can use the expandaudit utility to make the audit log contents more user-friendly. The expandaudit utility replaces ids with names and it can be easier to identify anomalies or patterns when you generate reports or charts using Spark or Zeppelin.