Common Features of Audit Logs for File System, Table, and Stream Operations

Entries for audit logs are initially held in memory until 128 operations have been logged or 10 seconds have elapsed, whichever happens first. At that point, the new log entries are flushed to disk, depending on the coalesce interval.

The coalesce interval represents the interval of time during which READ, WRITE, or GETATTR operations on one file from one client IP address and UID/GID are logged only once for a particular operation, if auditing is enabled.

For example, suppose that a client application reads a single file three times in 6 minutes, so that there is one read at 0 minutes, another at 3 minutes, and a final read at 6 minutes. If the coalesce interval is at least 6 minutes, then only the first read operation is logged. However, if the interval is between 4 minutes, then only the first and third read operations are logged. If the interval is 2 minutes, all three read operations are logged.

Now however, if the client was also writing to the file, irrespective of the coalesce interval for the read operation in the example stated previously, the write operation is logged, as it is a different operation from reading.

The default value is 60 minutes. Setting this field to a larger number helps prevent audit logs from growing quickly. To change the coalesce interval, see volume audit.

Audit logs are in JSON format, so they can be queried by Drill or processed by other third-party tools or your own scripts.

Audit logs are readable only by the mapr and root users on the cluster where the logs are located. These users can also copy and delete audit logs.

The status field in every log entry shows the status of the attempted operation. The status codes are taken from the Linux errno.h file. For a list of these codes, see Status Codes That Can Appear in Audit Logs.

Audit logs use Coordinated Universal Time (UTC) in the records of audited operations.

When operations are performed on directories, files, or tables that are being audited, the full names for those objects, as well as the current volume and the name of the user performing the operation, are not immediately available to the auditing feature. What are immediately available are IDs for those objects and users. Converting IDs to names at run-time would be costly for performance. Therefore, audit logs contain file identifiers (FIDs) for directories, files, and tables; volume identifiers for volume; and user identifiers (UIDs) for users.

You can resolve identifiers into names by using the expandaudit utility. This utility creates a copy of the log files for a specified volume, and in that copy are the names of the file system objects, users, and volumes that are in the audit log records. You can then query or process the copy.

A sample of the logs is as follows:

{"timestamp":{"$date":"2021-07-14T13:05:01.506Z"},"resource":"test-audit-logs","operation":"volumeMirrorPermCheck","username":"root","uid":0,"clientip":"10.163.167.214","status":0}
{"timestamp":{"$date":"2021-07-14T08:44:01.553Z"},"resource":"255","operation":"volumeLookup","username":"root","uid":0,"clientip":"10.163.167.214","status":2}
NOTE
There will be an entry in the audit log for each IP address on a node. For example, suppose there is a node with multiple IP addresses. The audit log on this node may show multiple entries of the same operation, each associated with a different IP address.
NOTE
The number of bytes read or written is not recorded.