Configuring the HPE Ezmeral Data Fabric FUSE-Based POSIX Client

Lists FUSE configuration parameters.

FUSE Parameters

You can set the POSIX client configuration values in the /opt/mapr/conf/fuse.conf file. After installing the FUSE-based POSIX client, you can edit the configuration file to define the values for the following parameters and save the file.

To retrieve the list of configuration parameters, run the following command:
/opt/mapr/bin/posix-client-* --help
Here * refers to the basic or platinum client package installed on the system. If necessary, set the shared LD_LIBRARY_PATH environment variable to run the help option with the command. For example:
export LD_LIBRARY_PATH=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64/jre/lib/amd64/server/:/opt/mapr/lib
NOTE
The HPE Ezmeral Data Fabric FUSE-based POSIX clients support only the configuration parameters in the fuse.conf file. All other FUSE configuration parameters are not supported. For more information on the non-mapr configuration parameters, refer to FUSE documentation.
fuse.access.type
Default Value: rw
Sets the type of access on the mount point. Value can be:
  • ro — Read only
  • rw — Read and write
fuse.affinity
Default Value: 0 (Disabled)
Specifies whether to enable (1) or disable (0) NUMA affinity. If enabled, sets the NUMA affinity for the POSIX client.
fuse.allow.other
Default Value: 1
Allow other users to access the mount point. Value can be one of:
  • 0 - do not allow other users
  • 1 - allow other users

Set this to 1 if the root user starts the FUSE service. Set to 0 or comment out this parameter if a non-root user starts the FUSE service . If set to 1, also add the user_allow_other parameter to the /etc/fuse.conf file.

fuse.asyncdirect.io
Default Value: 1
Specifies whether to enable asynchronous direct IO. Value can be one of:
  • 0 - disable
  • 1 - enable
fuse.attr.timeout
Default Value: 3.0
The timeout value in seconds for file/directory (regular) attributes (such as file size, UID, and GID, which are normally stored inside the inode) cache. This value is used to determine whether to use the cached attribute information (only if within the specified timeout window) or fetch attribute information again. The default is 3.0 seconds, which specifies that cached attribute information must be considered stale and refreshed after 3.0 seconds. You can assign fractions of a second as well (for example, fuse.attr.timeout=2.8).

Set the value for this parameter to 0 to compare POSIX (pjd) compliance with the ext3/4 file system. A value of 0 disables caching. For better performance, avoid disabling caching.

fuse.auto.inval.data
Default Value: 1
Specifies whether (1) or not (0) to automatically invalidate the kernel FUSE cache for any data change that causes mtime change, on the files. If set to 1, when the file is read, the correct file data is returned. If set to 0, the kernel cache of the data, which might not have the most current change, is returned.
fuse.auto.unmount
Default Value: 1
Specifies whether to automatically unmount the file system when the process is terminated. Value can be one of:
  • 0 - disable
  • 1 - enable
fuse.big.writes
Default Value: 1
Specifies whether to enable writes larger than 4KB. Value can be one of:
  • 0 - disable
  • 1 - enable
Sets the size of the data/buffer that can be transferred from the kernel to the FUSE library, per request. If enabled, FUSE allows writes of 128KB from the kernel. If disabled, FUSE allows writes of 4KB from the kernel.
fuse.client.lib.path
Default Value: /tmp
Specifies the path to store the client libraries.
NOTE
To install and use FUSE-based POSIX client and NFS v4 on the same node, ensure that the path for both the client library for the FUSE-based POSIX client, and NFS v4 is not /tmp, which is the default. Specify a different location for the client libraries. For example, /tmp/fuselib.
fuse.cluster.conf.location
Default Value: /opt/mapr/conf/mapr-clusters.conf
The path to the configuration file to use.
fuse.congestion.threshold
Default Value: 10
Specifies the kernel’s congestion threshold.
fuse.disable.shardcache
Default Value: 0 (false)
Specifies whether to disable shard cache, which is a cache of lookups. Value can be:
  • 0 - false
  • 1 - true
If true, more number of lookup calls are used. The FUSE client uses the shard cache to ensure that requests for data related to the same file are served by the same library. This is done using hash to improve performance. In very rare circumstances, it might make sense to disable this cache in conjunction with HPE Ezmeral Data Fabric support.
fuse.disable.writeback
Default Value: 0
Specifies whether (1) or not (0) to disable the writeback cache. This parameter is applicable only in kernel versions >= 3.15. By default, in kernel versions >= 3.15, writeback is enabled. To disable writeback cache, set the value for this parameter to 1. If enabled, the writes are buffered in the kernel. However, when multiple FUSE clients work on the same file, writes to a file by one FUSE client might never be seen by other FUSE clients performing a read because the kernel does not update the attributes of the file unless the file is modified locally. You can disable the writeback cache to allow the kernel to perform a write through.
fuse.enforce.core.pattern
Default Value: false
Specifies whether (true) or not (false) to write to /proc/sys/kernel/core_pattern file when the FUSE-based POSIX starts. The default value is false. If true, the core_pattern file contains an /opt/cores/%e.core.%p.%h entry and if false, the file is not touched.
fuse.entry.timeout
Default Value: 3
The timeout value in seconds for the name lookup cache. Use this parameter to determine whether to use the cached entry for the name lookup (if within the specified timeout window) or lookup the name again. The default is 3 seconds, which specifies that cached name lookup information must be considered stale and refreshed after 3 seconds. For this option, it is possible to give fractions of a second as well (for example, fuse.entry.timeout=2.8).

Set the value for this parameter to 0 to compare POSIX (pjd) compliance with the ext3/4 file system. Avoid retaining this value as 0 as it disables the cache, and impacts performance.

fuse.evenly.spread.data
Default Value: 0
Specifies whether (1) or not (0) to evenly spread writes across the nodes on the cluster. If set to 0, writes are always sent to the local primary node, from where data is replicated on all the other nodes. If set to 1, writes are distributed across different nodes. Set the value to 1 in case of reduced performance resulting from a large number of writes on the local primary node.
fuse.export
Default Value: /mapr
Denotes the fully-qualified cluster path to the volume or directory under the mount point.

When you do not specify a value, all clusters found in mapr-clusters.conf are mounted under the entity specified by the fuse.mount.point property (/mapr by default). If mapr-clusters.conf contains two clusters A and B, there are directories pointing to the root directories of those clusters, for example /mapr/A and /mapr/B.

When you specify a value, it overrides the default behavior, and causes exactly one path from one cluster to be exposed at the entity specified by the fuse.mount.point property. You can either fully expose a single cluster, or expose only a subset of a single cluster.

If you set fuse.export to the name of a cluster, enclosed within /, then that cluster is mounted at /mapr. For example if fuse.export=/A/, then the path /mapr shows the root directory of cluster A.

If you set fuse.export to a path within a cluster, then /mapr points to that path. For example, if fuse.export=/A/var/, then /mapr displays the directory contents of /var from the HPE Ezmeral Data Fabric cluster A.

NOTE
If the value is not a valid path to the name of a volume or directory, the FUSE service does not start. The value cannot be the path to a file.
fuse.fast.local.directio
Default Value: 0
Specifies whether to optimize (1) or disable (0) FUSE client for local direct IO. Value can be one of:
  • 0 - disable
  • 1 - optimize
fuse.flush.inline
Default Value: 0
Specifies whether (1) or not (0) to flush all writes inline. Value can be one of:
  • 0 - disable inline flushing
  • 1 - flush all writes inline
If disabled, for all open files, by default, the buffer is flushed automatically every 3 seconds or when it reaches 64KB. If enabled, writes are sent to server directly.
fuse.fsname
Default Value: FUSE mount point
Specifies the file system source, which is the first field in the /etc/mtab file. The default value is the FUSE mount point that is denoted by the parameter fuse.mount.point.
fuse.hb.interval
Default Value: 5
Specifies the heartbeat interval (in seconds) for the FUSE-based POSIX client.
fuse.log.debug_level
Default Value: error
The FUSE-based POSIX client log level. The value can be one of:
  • fatal
  • error
  • warn
  • info
  • debug
fuse.log.path
Default Value: /opt/mapr/logs
Specifies the path to store the log files.
fuse.max.background
Default Value: 64
Specifies the maximum number of asynchronous requests that can be submitted. IO requests beyond the maximum limit are blocked.
fuse.max.cache.pages
Default Value: 1048576 (1 Million pages)

Specifies the maximum number of pages (each page is 8KB) in the page cache that each HPE Ezmeral Data Fabric Client library in FUSE process can use when working with a large number of open files. This setting limits the amount of memory consumed by FUSE.

fuse.max.read
Default Value: 131072
Specifies the maximum size (in bytes) of read requests.
fuse.max.readahead
Default Value: 131072
Specifies the maximum number of bytes to read ahead.
fuse.max.write
Default Value: 131072
Specifies the maximum number of bytes that is allowed in a single write request.
fuse.mount.point
Default Value: /mapr
This parameter is mandatory. Specifies the mount point where the HPE Ezmeral Data Fabric file system must be mounted. Ensure that the specified mount point is empty before starting the service. Once mounted, the POSIX client has access to all the clusters specified in /opt/mapr/conf/mapr-clusters.conf file. The value should not be /mapr if you wish to mask HPE Ezmeral Data Fabric branding.
NOTE
If NFS server is also running on this node, ensure that the FUSE mount point is different from the NFS server mount point.
fuse.mount.setuid
Default Value: 0

By default, FUSE mounts with the nosuid option. This prevents users other than root from running executable files with the SUID bit set, on FUSE. Enable this parameter (set to 1), to allow users other than root to run executable files with the SUID bit enabled, on the HPE Ezmeral Data Fabric Fuse File System.

This parameter works in conjunction with the allowreadforexecute parameter in volume create and volume modify commands.

The following table describes how both parameters work together to permit running SUID binaries:

Table 1. Suid Execution
fuse.mount.setuid allowreadforexecute Result
Disabled Does not matter SUID binaries cannot be executed by users other than root.
Enabled Disabled Users other than root can run the SUID binaries only when the binary has both read and execute permissions.
Enabled Enabled Users other than root can execute the SUID binaries either when the binary has both read and execute permissions OR execute permission alone.
fuse.negative.timeout
Default Value: 3
Applicable for the Container, Basic, and Platinum POSIX clients.

Indicates the duration in seconds to cache negative lookup results.

Negative lookup results that are returned when a file does not exist (lookup retuned ENOENT), are cached for the specified number of seconds. The lookup is performed again, only after this period elapses. The file is deemed to be non-existent till this period elapses.

The default value of 3 indicates that negative lookup results are cached for 3 seconds.

Set this value to 0 to disable the negative lookup cache.

When patching or upgrading the client from an older release, this parameter is automatically applied. However, new parameters are not automatically written to fuse.conf. Make sure to copy this parameter from fuse.conf.new to fuse.conf, only if you want to change the default value, or disable this cache.
fuse.nonempty
Default Value: 0
Specifies whether FUSE can be mounted on a non-empty mount point (1) or on an empty mount point (0). Value can be:
  • 0 - indicates that mount point should be empty
  • 1 - indicates that mount point need not be empty
fuse.num.libs
Default Value:
  • Container - 1
  • Basic - 1
  • Platinum - 5
Specifies the number of client libraries to run with. For:
  • Container client, value must be 1.
  • Basic client, value must be 1.
  • Platinum client, default value is 5 and can be set to a value greater than 5.
More than one library allows for more than 1GB/sec throughput on remote operations as each additional library increases the throughput by sharding operations across libraries (for parallelism).
NOTE
Each additional library will consume additional memory and CPU.
fuse.num.threads
Default Value: 64
Specifies the number of FUSE threads in userspace per mount point. A higher number allows parallel processing of multiple operations. Recommended value is only up to 64.
fuse.ra.sessions
Default Value:
  • Container - 1
  • Basic - 1
  • Platinum - 5
Specifies the number of parallel read ahead sessions per library. Each open file acts as one read ahead session. For example, for the default value of 5, up to 5 files can have read ahead sessions per library. If value is set to 0, readahead is disabled.
NOTE
A greater value allows larger number of parallel read ahead sessions, which is useful if more number of files need to be opened simultaneously. However, each additional read ahead session consumes additional memory (512K per read ahead session) and threads.
fuse.readdirplus
Default Value: 1
Enables (1) or disables (0) readdirplus functionality for high latency networks. The readdirplus attribute returns the file handle and attribute information such as the name and the file ID, along with the directory entries, unlike the readdir attribute that requires the client to query the server separately for each directory entry. For the best performance, do not disable this parameter.
fuse.sync.read
Default Value: 0
Specifies whether to enable or disable synchronized reads. Value can be:
  • 0 - disable
  • 1 - enable
fuse.ticketfile.location
Default Value: /opt/mapr/conf/maprfuseticket
Specifies the ticket to use to start the service in secure mode. Generate the required ticket and place it in /opt/mapr/conf/<maprfuseticket>.
NOTE
To support impersonation, provide the mapr user ticket file location or the user’s servicewithimpersonation ticket file location. You can use the mapr user ticket on the server node, and service with impersonation ticket on client node. The FUSE service must be started by the root user if servicewithimpersonation ticket is specified. In case of non-impersonated ticket, the ticket credentials becomes the identity for all the requests, no matter which user is accessing the fuse mount point.
See also: Setting up a Ticket for the POSIX Client.
fuse.track.memory
Default Value: false
Specifies whether to enable (true) or disable (false) memory tracking for FUSE.
fuse.use.compressed.inode.format
Default Value: 0
Specifies whether or not to use compressed inode format. When enabled, a 16-bit unique identifier is used to avoid inode cache collisions when multiple clients are modifying (creating, deleting, and similar operations) the same directories/files. The value can be one of:
  • 0 — (default) do not use compressed inode format
  • 1 — use compressed inode format including unique identifier
NOTE
Even when set to 1, EBUSY errors are returned if client accesses more than 32k volumes at the same time.

Enabling this flag may not completely avoid inode cache collisions when too many modifications such as creation, and deletion are performed on the same directories or files. Give the kernel sufficient time to purge inode cache entries between modifications.

fuse.xattr.enable
Default Value: 0 (false)
Specifies whether (true) or not (false) to enable extended attributes through the FUSE client. Value can be one of:
  • 0 - false
  • 1 - true
The default value is 0 (false). This is disabled by default because if enabled, during operations, the kernel might make a lot of extended attribute calls for security checks resulting in performance degradation even when there are no extended attributes on the inode. When disabled, extended attributes can still be added using the hadoop fs command; however, this must be enabled to perform any operations on extended attributes using the FUSE-based POSIX client.
NOTE
Of the five types of extended attribute namespaces in Linux, system, trusted, user, raw, and security, only user namespace is supported. For all other namespaces, EINVAL is returned.

You must start/restart the FUSE-based POSIX client for the changes to take effect. See Starting and Stopping the POSIX Client for more information.

Configuration Backup When Installing/Upgrading POSIX Clients

When you install a patch, the /opt/mapr/conf/fuse.conf.new file contains the new settings. You can copy the new parameters (with default values) to your existing fuse.conf file and restart FUSE for the settings to take effect.

When you upgrade from a prior release, on all supported OS other than Ubuntu, the old fuse.conf file is backed up as fuse.conf.backup, before being overwritten with the new settings. This backup is available in the /opt/mapr/conf directory.

On Ubuntu, the upgrade process does not create a backup copy of the file. You need to manually backup the fuse.conf file before upgrading, as this file is overwritten with the new settings after upgrading.

To continue using FUSE with your custom settings, and take advantage of the new settings, manually copy your custom settings in the fuse.conf.backup file to the fuse.conf file, set custom values for the new parameters in the fuse.conf file where necessary, and restart FUSE for the settings to take effect.

To restart FUSE, use one of the following commands depending on the POSIX client of your choice:

  • For POSIX container: service mapr-posix-client-container restart
  • For POSIX basic: service mapr-posix-client-basic restart
  • For POSIX platinum: service mapr-posix-client-platinum restart

Optimizing FUSE performance when running the Flexible I/O tester (fio tool)

Performance Tips

  • With Linux kernels prior to version 4.8, size extending writes are serialized by the kernel, and result in degraded write performance. For optimized write performance, ensure that the Linux kernel in use, is at least version 4.8.
  • With kernel 4.8 and above, fio performance improves when using larger block sizes and larger number of jobs (numjobs). Keep numjobs constant and use larger blocksizes (>128k) for enhanced performance.

For example, for optimised performance, the fio command could be as follows:

fio --ioengine=libaio --direct=1 --gtod_reduce=1 --name=perftest --filename=perfile
          --bs=16m --iodepth=64 --size=4G --rw=write --numjobs=4

Configuring Timeout for Inactive Connections

In cases where the file client connects infrequently to a remote CLDB node that is firewalled, TCP segments on the connection are silently dropped by the firewall due to the long idle time. However, the client keeps waiting for the response till RPC times out. To mitigate this scenario, you can now configure the timeout for inactive connections. Use the fs.mapr.binding.inactive.threshold parameter in the core-site.xml file to set this threshold in seconds. For example:
<property>
<name>fs.mapr.binding.inactive.threshold</name>
<value>600</value>
</property>

In this example, when the client tries to send data to the CLDB after a certain idle time, the system checks if the specified time (here 600 seconds, that is 10 minutes) is crossed after the previous request was sent. If so, the system tears down the existing TCP connection and creates a new TCP connection for the file client and CLBD to use for communication.