Data Compaction and Recall Criteria
The topic describes the criteria for MAST gateway to decide whether compaction is to be performed for a container (data container or namespace container).
Containers are of two types:
- Namespace containers
- Data Containers
- Large containers: Containers can be termed as large containers when
the number of inodes in the container is greater than the value of the
configuration variable,
mastgateway.offload.opt.largenuminodes
. - Non-large containers: Containers can be termed as non-large
containers when the number of inodes in the container is less than the value
of the configuration variable,
mastgateway.offload.opt.largenuminodes
.
Compaction Criteria for Large Container
Compaction is carried out for large containers (namespace container/data container),
where the size of garbage present in the container is greater than the garbage
threshold. The garbage threshold is the value set for the configuration variable,
mastgateway.ctc.opt.largenuminodes.threshmb
(default value is 2
GB).
Compaction is skipped for large containers, where the garbage in the container is less than the garbage threshold.
Recall Expiry Criteria for Large Containers
If data has been recalled from a tier into a Data Fabric cluster, and the size of
recalled data is greater than configured value for
mastgateway.recallexp.opt.largenuminodes.minpurgemb
, the compactor
purges the qualified recalled data from the container.
If data has been recalled, and the size of recalled data is less than the configured
value for mastgateway.recallexp.opt.largenuminodes.minpurgemb
recall expiry is skipped and recalled data is retained on the container of the
tiered volume.
Skip Compaction for Large Containers with Garbage Size Greater than Garbage Threshold
You might want to skip the scheduled compaction for a very large container, and run the compaction manually, at a convenient time.
For this purpose, set the configuration variable,
mastgateway.ctc.opt.largenuminodes.skipqualifiedctrs.enabled
(default value is 0), to true. For details on this configuration variable, refer
to config.
When
mastgateway.ctc.opt.largenuminodes.skipqualifiedctrs.enabled
is set to 1, large containers qualifying the threshold skip the compaction. CLDB
raises the alarm, VOLUME_ALARM_COMPACTION_SKIPPED_LARGE_CONTAINER, when the
compaction is skipped for a large namespace container qualifying the threshold.
When compaction is skipped in such a case, compaction can be forced to
run on such qualified containers by running compaction manually using the
maprcli volume compact
command. Refer to Compaction Skipped Large Container Volume Alarm for the alarm
details.
Compaction Criteria for Non-large Containers
Non-large containers are compacted, by default.
Recall Expiry Criteria for Non-large Containers
If the size of the recalled data in a container
(mastgateway.recallexp.opt.largenuminodes.minpurgemb
, default
value is 2 GB) is greater than configured recall expiry min threshold
(mastgateway.recallexp.opt.minpurgemb
, default value is 8 MB),
recall expiry occurs on the recalled data. The compactor purges the qualified
recalled data from the tiered volume.
Refer to config for information about the configuration variables,
mastgateway.recallexp.opt.largenuminodes.minpurgemb
and
mastgateway.recallexp.opt.minpurgemb
.