Overview of MAST Gateway
Describes the role of the MAST Gateway for operations on tiered storage.
The MAST Gateway can be installed on specific hosts on the data fabric cluster with access to the tier. The MAST Gateway acts as the centralized entry point for all the operations that need to be performed on the tiered storage including the following:
- Identifies files in the volume that are ready to be offloaded, fetches data
corresponding to these files from file system, and packs
this data for offload. It:
- Identifies and fetches the data to offload.
It handles both compressed and uncompressed data. Compressed data from the file server is transferred and stored as-is on the warm tier.
- Creates stripes based on the erasure coding scheme.
For example, for an erasure coding scheme of 4+2, the stripe depth would be 6x4MB=24MB.
- Manages statistics on the amount of data offloaded.
- Prepares a corresponding metadata on the data fabric cluster for the data.
The MAST Gateway stores the metadata in HPE Ezmeral Data Fabric Database tables in a separate volume associated with the tier.
- Identifies and fetches the data to offload.
- Tracks invalid data and deletes stripelets that are completely invalid.
- Fetches data from the tier.
- Recalls whole volume from the tier to the data fabric cluster.
- Identifies files in the volume that are ready to be offloaded, fetches data
corresponding to these files from file system, and packs
this data for offload. It:
- Identifies and fetches the data to offload and creates objects (including creating new buckets) in the storage tier for the data.
- Manages statistics on the amount of data offloaded.
- Updates metadata references for remote access.
- Tracks invalid data and deletes objects that are completely invalid.
- Fetches data from the tier. It:
- Handles both compressed and uncompressed data. If data on file server is compressed, the compressed data is not uncompressed/re-compressed during offload or recall. Compressed data from the file server is transferred and stored as-is on the cold tier.
- Ensures that data is decrypted, if it is encrypted, before forwarding it to file system.
- Recalls whole volume from the tier to the data fabric cluster.
The MAST Gateway uses curl to transfer data to and from S3 cloud storage.
The MAST Gateway uses an exponential backoff retry mechanism. If curl fails to connect to the S3 destination even after a minute of trying, or if curl fails to fetch data from the S3 destination even after 5 minutes of being connected, the MAST Gateway declares a failure and reports it to the CLDB. The CLDB then reschedules the (vol) tasks after 30 minutes.
The MAST Gateway sends heartbeat messages to CLDB every 5 seconds. CLDB manages the discovery and a minimal global state of the MAST Gateway service. CLDB also manages the volumes and any policy configurations on the volumes. When a volume is assigned to a gateway, the volume remains assigned to the gateway across CLDB, Gateway, and cluster restarts. Volumes are assigned evenly to gateways and CLDB balances the gateway load. For more information, see Balancing Gateway Load.
By default, the MAST Gateway uses 16 threads for volume and file offload and recall operations and another 16 threads for handling internal operations and other operations such as reads (which triggers automatic recall requests), writes, etc. Each thread processes uses the curl library to offload or recall a container (associated with a volume). Each MAST Gateway can process one or more volumes (and associated containers) simultaneously depending on the number of threads available for processing the containers associated with the volumes. Each volume is assigned to a MAST Gateway for a tiering operation irrespective of the number of containers associated with the volume.
When a MAST Gateway goes down during a volume-level offload, CLDB does not immediately reassign all the volumes assigned to that MAST Gateway to other gateways. CLDB waits for some time to allow the MAST Gateway to come back up and send heartbeat again; CLDB re-assigns volumes with pending tasks to other gateways if the MAST Gateway does not come back up again. All other volumes are redistributed when the gateway balancer runs again. On the other hand, if the MAST Gateway comes back up again, the volumes remain assigned to the MAST Gateway. The load on the MAST Gateways is rebalanced when the balancer runs again. See Balancing Gateway Load for more information. MAST Gateways use transactions to ensure that all the updates are consistent, and that any new gateway can pick up exactly from where the old gateway left.
If a MAST Gateway goes down during a file-level offload and if the offload was triggered using:
- The
hadoop
command, CLDB reassigns the volume to another MAST Gateway. - The MapR CLI, REST API, or dot interface, CLDB does not reassign to another MAST Gateway.
See also: Managing the MAST Gateway