Remote Direct Memory Access

This page introduces Remote Direct Memory Access (RDMA), describes the advantages of RDMA over TCP/IP, documents RDMA system requirements, and lists commands you can use to enable RDMA.

What is RDMA?

TCP/IP communication uses copy operations that involve user-kernel context switching, user-kernel memory copies, Linux kernel interrupt processing, and kernel packet processing. TCP/IP suffers from two major problems:

  • TCP/IP consumes significant CPU cycles and memory resources
  • TCP/IP has large end-to-end latency

Remote Direct Memory Access (RDMA) mitigates these major problems by copying data directly between virtual memory buffers on two different machines, resulting in lower latency, higher throughput, and smaller CPU footprint.

RDMA transfers do not involve the CPU, and there are no context switches. Transfers occur in parallel with other system operations.

The following diagram compares TCP and RDMA operations:

Figure 1. TCP/IP vs RDMA Communication


Supported RDMA Type

There are two kinds of RDMA protocols in existence - RDMA over Converged Ethernet (RoCE) and iWARP. HPE Data Fabric supports only RoCE.

When HPE Data Fabric Uses RDMA

HPE Data Fabric uses RDMA when it needs to transfer data between:

  • Fileclient, FUSE to MFS
  • NFS clients and NFS gateway
  • MFS instances

RDMA System Requirements

To benefit from RDMA, your system needs to have a Network Interface Card (NIC) that supports RDMA. HPE Data Fabric is tested with Mellanox cards, but any NIC that supports RDMA should work. Ensure that you have Infiniband support installed. To install Infiniband support, run:

On CentOS:
yum -y groupinstall "Infiniband Support" 
To determine whether your NIC supports RDMA, run:
ibv_devinfo | grep "PORT_ACTIVE"

If the command returns the active ports, then your NIC(s) support(s) RDMA.

For example:
# ibv_devinfo | grep "PORT_ACTIVE"
                        state:                  PORT_ACTIVE (4)
                        state:                  PORT_ACTIVE (4)
Optionally, to determine the interfaces with RDMA support:
  1. Run:
    ibv_devices
    The output returns the Infiniband devices. For example:
    device                 node GUID
        ------              ----------------
        mlx4_0              040973ffffd661f0
        mlx4_1              b88303ffff9e5440
  2. Run:
    ls /sys/class/infiniband/<Infiniband_Device_Name>/device/net/
    to determine the RDMA NIC. For example:
    ls /sys/class/infiniband/mlx4_0/device/net/
      eno5d1  ib0
    Here, the NIC is eno5d1.
  3. To confirm that this NIC exists, run:
    ip a | grep <NIC>
    For example:
    ip a | grep eno5d1
      6: eno5d1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
        inet 10.163.160.63/21 brd 10.163.167.255 scope global noprefixroute eno5d1
        inet 10.163.160.47/24 scope global eno5d1:~m0

RDMA is automatically enabled only when the NICs/nodes are RDMA capable. HPE Data Fabric automatically uses TCP/IP when the system does not support RDMA.

Enabling RDMA

By default, RDMA is disabled on all nodes. To enable RDMA, use any of the following options:

To enable RDMA on any module, you must first enable it on the cluster and node.
  1. For the cluster, set /opt/mapr/bin/maprcli config save -values '{"support.rdma.transport":"1"}'
  2. For the node, in the /opt/mapr/conf/env_override.sh file on that node, set export MAPR_RDMA_SUPPORT=true
To enable RDMA on MFS:
  1. Enable RDMA on the cluster and node.
  2. In the /opt/mapr/conf/mfs.conf file on that node, set mfs.listen.on.rdma=1.
To enable RDMA between NFS and MFS:
  1. Enable RDMA on the cluster and node.
  2. Enable RDMA on all MFS nodes in the cluster.
  3. In the /opt/mapr/conf/nfsserver.conf file on that NFS node, set NfsRdmaToMfs=1.
To enable RDMA on Fuse, Moss, and Hadoop:
  1. Enable RDMA on the cluster and node.
  2. Set the property fs.mapr.disable.rdma.transport to false in the /opt/mapr/hadoop/hadoop-<version>/etc/hadoop/core-site.xml file.

NFS Port for RDMA Communication

By default, NFS servers use port 20049 to communicate with NFS clients using RDMA. To change this port, set the NfsRdmaPort parameter in /opt/mapr/conf/nfsserver.conf to the desired port. For example:
NfsRdmaPort=20050
NOTE
Setting this port to 0 causes NFS servers to use TCP to communicate with NFS clients.

NFS Mount With RDMA

To mount an NFS server with RDMA support on an NFS client, use the following command:
mount -o vers=3,proto=rdma,port=20049 <NFSserver IP>:<directory> <mount point>

RDMA Specific Commands

You can use the following mrconfig commands to display RDMA information: