This section describes how you can get started learning about, installing, and using the HPE Ezmeral Data Fabric.
This section contains conceptual information that can help you to understand and use the HPE Ezmeral Data Fabric.
This section describes how to administer fabric resources in the global namespace of your HPE Ezmeral Data Fabric as-a-service platform.
Provides reference information for the HPE Ezmeral Data Fabric.
Definitions for commonly used terms in MapR Converged Data Platform environments.
A special directory in the top level of each volume that contains all the snapshots created or preserved for the volume.
A Boolean expression that defines a combination of users, groups, or roles that have access to an object stored natively such as a directory, file, or HPE Ezmeral Data Fabric Database table.
A list of permissions attached to an object. An ACL specifies users or system processes that can perform specific actions on an object.
An ACL or policy in JSON format that describes user access. Grants accounts and IAM users permissions to perform resource operations, such as putting objects in a bucket. You associate access policies with accounts, users, buckets, and objects.
A user or users with special privileges to administer the cluster or cluster resources. Administrative functions can include managing hardware resources, users, data, services, security, and availability.
An advisory disk capacity limit that can be set for a volume, user, or group. When disk usage exceeds the advisory quota, an alert is sent.
Physical isolation between a computer system and unsecured networks. To enhance security, air-gapped computer systems are disconnected from other systems and networks.
Files in the file system are split into chunks (similar to Hadoop blocks) that are normally 256 MB by default. Any multiple of 65,536 bytes is a valid chunk size, but tuning the size correctly is important. Files inherit the chunk size settings of the directory that contains them, as do subdirectories on which chunk size has not been explicitly set. Any files written by a Hadoop application, whether via the file APIs or over NFS, use chunk size specified by the settings for the directory where the file is written.
A node that runs the mapr-client that can access every cluster node and is used to access the cluster. Also referred to as an "edge node." Client nodes and edge nodes are NOT part of a Data Fabric cluster.
mapr-client
The Data Fabric user.
A compute node is used to process data using a compute engine (for example, YARN, Hive, Spark, or Drill). A compute node is by definition a Data Fabric cluster node.
The unit of shared storage in a Data Fabric cluster. Every container is either a name container or a data container.
A service, running on one or more Data Fabric nodes, that maintains the locations of services, containers, and other cluster information.
The minimum complement of software packages required to construct Data Fabric cluster. These packages include mapr-core, mapr-core-internal, mapr-cldb, mapr-apiserver, mapr-fileserver, mapr-zookeeper, and others. Note that ecosystem components are not part of core.
mapr-core
mapr-core-internal
mapr-cldb
mapr-apiserver
mapr-fileserver
mapr-zookeeper
A service that acts as a proxy and gateway for translating requests between lightweight client applications and the Data Fabric cluster.
A process that enables users to remove empty or deleted space in the database and to compact the database to occupy contiguous space.
One of the two types of containers in a Data Fabric cluster. Data containers typically have a cascaded configuration (master replicates to replica1, replica1 replicates to replica2, and so on). Every data container is either a master container, an intermediate container, or a tail container depending on its replication role.
A collection of nodes that work together under a unified architecture, along with the services or technologies running on that architecture. A fabric is similar to a Linux cluster. Fabrics help you manage your data, making it possible to access, integrate, model, analyze, and provision your data seamlessly.
The "Data Fabric user." The user that cluster services run as (typically named mapr or hadoop) on each node.
mapr
hadoop
A gateway that supports table and stream replication. The Data Fabric gateway mediates one-way communication between a source Data Fabric cluster and a destination cluster. The Data Fabric gateway also applies updates from JSON tables to their secondary indexes and propagates Change Data Capture (CDC) logs.
The user that cluster services run as (typically named mapr or hadoop) on each node. The Data Fabric user, also known as the "Data Fabric admin," has full privileges to administer the cluster. The administrative privilege, with varying levels of control, can be assigned to other users as well.
A data node has the function of storing data and always runs FileServer. A data node is by definition a Data Fabric cluster node.
The number of copies of a volume that should be maintained by the Data Fabric cluster for normal operation.
A label for a feature or collection of features that have usage restrictions. Developer previews are not tested for production environments, and should be used with caution.
The application containers used by Docker software. Docker is a leading proponent of OS virtualization using application containers ("containerization").
Relates to Object Store. A domain is a management entity for accounts and users. The number of users, the amount of disk space, number of buckets in each of the accounts, total number of accounts, and the number of disabled accounts are all tracked within a domain. Currently, Object Store only supports the primary domain; you cannot create additional domains. Administrators can create multiple accounts in the primary domain.
Relates to Object Store. A domain user is a cluster security principal authenticated through AD/LDAP. Domain users only exist in the default account. Domain users can log in to the Object Store UI with their domain username and password​.
A selected set of stable, interoperable, and widely used components from the Hadoop ecosystem that are fully supported on the Data Fabric platform.
A small-footprint edition of the HPE Ezmeral Data Fabric designed to capture, process, and analyze IoT data close to the source of the data.
A node that runs the mapr-client that can access every cluster node and is used to access the cluster. Also referred to as a "client node." Client nodes and edge nodes are NOT part of a Data Fabric cluster.
A filelet, also called an fid, is a 256MB shard of a file. A 1 GB file for instance is comprised of the following filelets: 64K (primary fid)+(256MB-64KB)+256MB+256MB+256MB.
The NFS-mountable, distributed, high-performance HPE Ezmeral Data Fabric data-storage system.
A node on which a mapr-gateway is installed. A gateway node is by definition a Data Fabric cluster node.
mapr-gateway
The data plane that connects HPE Ezmeral Data Fabric deployments. The global namespace is a mechanism that aggregates disparate and remote data sources and provides a namespace that encompasses all of your infrastructure and deployments. Global namespace technology lets you manage globally deployed data as a single resource. Because of the global namespace, you can view and run multiple fabrics as a single, logical, and local fabric. The global namespace is designed to span multiple edge nodes, on-prem data centers, and clouds. See overview/global_namespace.html.
A signal sent by each FileServer and NFS node every second to provide information to the CLDB about the node's health and resource usage.
Relates to Object Store. An IAM (Identity and Access Management) user represents an actual user or an application. An administrator creates IAM users in an Object Store account and assigns access policies to them to control user and application access to resources in the account.
A program that simplifies installation of the HPE Ezmeral Data Fabric. The Installer guides you through the process of installing a cluster with Data Fabric services and ecosystem components. You can also use the Installer to update a previously installed cluster with additional nodes, services, and ecosystem components. And you can use the Installer to upgrade a cluster to a newer core version if the cluster was installed using the Installer or an Installer Stanza.
A process that purges messages previously published to a topic partition, retaining the latest version.
A gateway that serves as a centralized entry point for all the operations that need to be performed on tiered storage.
The minimum number of copies of a volume that should be maintained by the Data Fabric cluster for normal operation. When the replication factor falls below this minimum, re-replication occurs as aggressively as possible to restore the replication level. If any containers in the CLDB volume fall below the minimum replication factor, writes are disabled until aggressive re-replication restores the minimum level of replication.
A replica of a volume.
MOSS is the acronym for Multithreaded Object Store Server.
A container in a Data Fabric cluster that holds a volume's namespace information and file chunk locations, and the first 64 KB of each file in the volume.
A protocol that allows a user on a client computer to access files over a network as though they were stored locally.
An individual physical or virtual machine in a fabric.
A data service that works with the ResourceManager to host the YARN resource containers that run on each data node.
File and metadata that describes the file. You upload an object into a bucket. You can then download, open, move, or delete the object.
Object and metadata storage solution built into the HPE Ezmeral Data Fabric. Object Store efficiently stores data for fast access and leverages the capabilities of the patented HPE Ezmeral Data Fabric file system for performance, reliability, and scalability.
The service that manages security policies and composite IDs.
A disk capacity limit that can be set for a volume, user, or group. When disk usage exceeds the quota, no more data can be written.
The number of copies of a volume.
The replication role of a container determines how that container is replicated to other storage pools in the cluster.
The replication role balancer is a tool that switches the replication roles of containers to ensure that every node has an equal share of of master and replica containers (for name containers) and an equal share of master, intermediate, and tail containers (for data containers).
Re-replication occurs whenever the number of available replica containers drops below the number prescribed by that volume's replication factor. Re-replication may occur for a variety of reasons including replica container corruption, node unavailability, hard disk failure, or an increase in replication factor.
A YARN service that manages cluster resources and schedules applications.
The service that the node runs in a cluster. You can use a node for one, or a combination of the following roles: CLDB, JobTracker, WebServer, ResourceManager, Zookeeper, FileServer, TaskTracker, NFS, and HBase.
A Kubernetes object that holds sensitive information, such as passwords, tokens, and keys. Pods that require this sensitive information reference the secret in their pod definition. Secrets are the method Kubernetes uses to move sensitive data into pods.
The HPE Ezmeral Data Fabric platform and supported ecosystem components are designed to implement security unless the user takes specific steps to turn off security options.
A group of rules that specify recurring points in time at which certain actions are determined to occur.
A read-only logical image of a volume at a specific point in time.
A unit of storage made up of one or more disks. By default, Data Fabric storage pools contain two or three disks. For high-volume reads and writes, you can create larger storage pools when initially formatting storage during cluster creation.
The number of disks in a storage pool.
The group that has administrative access to the Data Fabric cluster.
The user that has administrative access to the Data Fabric cluster.
Operation of applying a security policy to a resource.
In the Data Fabric platform, a file that contains keys used to authenticate users and cluster servers. Tickets are created using the maprlogin or configure.sh utilities and are encrypted to protect their contents. Different types of tickets are provided for users and services. For example, every user who wants to access a cluster must have a user ticket, and every node in a cluster must have a server ticket.
maprlogin
configure.sh
A tree of files and directories grouped for the purpose of applying a policy or set of policies to all of them at once.
A Data Fabric process that coordinates the starting and stopping of configured services on a node.
A unit of memory allocated for use by YARN to process each map or reduce task.
A coordination service for distributed applications. It provides a shared hierarchical namespace that is organized like a standard file system.
For more information, see Data Fabric user.