About Release 7.9.0
This site contains documentation for HPE Ezmeral Data Fabric release 7.9.0, including installation, configuration, administration, and reference content, as well as content for the associated ecosystem components and drivers.
7.9.0 Installation
This section contains information about installing HPE Ezmeral Data Fabric software. It also contains information about how to migrate data and applications from an Apache Hadoop cluster to a HPE Ezmeral Data Fabric cluster.
7.9.0 Data Fabric
HPE Ezmeral Data Fabric is the industry-leading data platform for AI and analytics that solves enterprise business needs.
7.9.0 Administration
This section describes how to manage the nodes and services that make up a cluster.
7.9.0 Development
This section contains information related to application development for Ezmeral ecosystem components and HPE Ezmeral Data Fabric products, including the file system, Database (Key-Value and JSON), and Event Streams.
- Application Development Process
  Before you start developing applications on the HPE Ezmeral Data Fabric platform, consider how you will get the data into the platform, the storage format of the data, the type of processing or modeling that is required, and how the data will be accessed.
- File Store and Apps
  The following sections provide information about accessing the File Store with C and Java applications.
- HPE Ezmeral Data Fabric Database and Apps
  This section contains information about developing client applications for JSON and key-value tables.
- Apache Kafka Wire Protocol Service
  HPE Ezmeral Data Fabric Streams supports Apache Kafka Wire Protocol Service. Apache Kafka Wire Protocol Service is a TCP/IP service that emulates a Kafka cluster backed by HPE Ezmeral Data Fabric Streams. The service makes it possible for Apache Kafka clients written in any programming language to access topics in HPE Ezmeral Data Fabric Streams.
- HPE Ezmeral Data Fabric Streams and Apps
  HPE Ezmeral Data Fabric Streams brings integrated publish and subscribe messaging to HPE Ezmeral Data Fabric.
- MapReduce and Apps
  This section contains information associated with developing YARN applications.
- Kubernetes Interfaces for Data Fabric
  This section describes how to leverage the capabilities of the Kubernetes Interfaces for Data Fabric.
- Ecosystem Components
  The following sections provide information about each open-source project that is supported by the HPE Ezmeral Data Fabric.
  - Ecosystem Packs
  - Apache Airflow
    This topic provides an overview of Apache Airflow on HPE Ezmeral Data Fabric.
  - AsyncHBase
  - Cascading
  - Apache Drill
    - Drill Tutorial
    - Drill-on-YARN
    - Configuring Drill
      Lists the data-fabric-specific configuration for Drill.
      - Adding a Drill Node to a Cluster
      - Configuring Drill Memory
        A system administrator can modify the amount of system memory that Warden allocates to the Drill service on each node in the warden.drill-bits.conf file. Drill users, with file permissions, can modify the amount of heap and direct memory allocated to the Drill service on each node in the drill-env.sh file.
      - Configuring the Parquet Block Size
        The default value for the store.parquet.block-size parameter is 268435456 (256 MB), the same size as file system chunk sizes. In previous versions of Drill, the default value was 536870912 (512 MB).
      - Configuring Multiple Drill Clusters and Designating One Cluster as an OJAI Distributed Query Service
        As of Core 6.0 and Drill 1.11, you can run operational queries through the OJAI Distributed Query Service, as well as analytical queries through Drill. If you want to run operational and analytical workloads in your cluster, you must configure multiple Drill clusters within the cluster and then configure a Drill cluster as the OJAI Distributed Query Service. Restricting each workload to its own cluster improves query performance.
        Step 1: Plan the Clusters
        Decide which nodes in the cluster you want to run Drill and which nodes you want to run the OJAI Distributed Query Service.
        Step 2: Manually Install Drill on All Nodes
        Manually install Drill on all nodes, including the nodes designated to run the OJAI Distributed Query Service.
        Step 3: Define Node Topologies
        Node topologies restrict data to a designated set of nodes.
        Step 4: Create Volumes
        Volumes organize data and manage cluster performance. Create and mount a volume to each of the topologies (Drill clusters) you created.
        Step 5: Configure Multiple Drill Clusters
        Update the /opt/mapr/drill/drill-<version>/conf/drill-override.conf file on each Drill node that is part of a cluster with the cluster ID and a ZooKeeper entry to define the Drill cluster. Each Drill cluster should have a unique cluster ID and ZooKeeper entry to separate the clusters.
        Step 6: Configure Workspaces
        You must configure a workspace on one Drill node in each Drill cluster that points to the volume directory where data is stored. When you create a workspace, you must include the volume mount point.
        Step 7: Register a Drill Cluster as an OJAI Distributed Query Service
        You can select any of the configured Drill clusters to act as the OJAI Distributed Query Service provider for operational queries, by running the queryservice setconfig command.
      - Configuring a Multitenant Cluster
      - Configuring the ZooKeeper PStore Location
        By default, the ZooKeeper PStore offloads query profile data to maprfs:///apps/drill/profiles. You can override the default location in the drill-override.conf file.
      - Configuring HBase Persistent Storage Tables
        Describes how to configure Drill to persist query profile data to a table that is unaffected by the TTL duration.
      - Configuring cgroups to Control CPU Usage
        Starting in Drill 1.13, you can configure a Linux cgroup (control group) to enforce CPU limits on the Drillbit service running on a node. Linux cgroups enable you to limit system resources to defined user groups or processes. You can use the cgconfig service to configure a Drill cgroup to control CPU usage and then set the CPU limits for the Drill cgroup on each Drill node in the /etc/cgconfig.conf file.
      - Configuring SSO with OpenID (Drill)
        Describes the procedure to Configure SSO with OpenID (Drill).
      - Drill Properties
        Describes how to use Drill properties and options.
    - Working with Drill
    - Securing Drill
      An administrator can install Drill with the default security configuration or manually configure custom security for Drill.
    - Data Fabric Drill Drivers
      HPE Ezmeral Data Fabric provides Drill ODBC and JDBC drivers that you can download and use to connect Drill to BI tools. The drivers are updated periodically to include support for new functionality in Drill.
    - Drill Configuration Files
      The Drill installation includes configuration files with start-up options that you can modify prior to starting Drill.
    - Mask Sensitive Data in Query Logs and Profiles
      Starting in Drill 1.20.2 (EEP 9.0.0 installed on Core 7.1.0), you can define a set of rules in a JSON file to mask sensitive data in Drill query logs and query profiles.
    - Monitoring Drill Metrics
    - Optimizing Queries with Indexes
      HPE Ezmeral Data Fabric Database provides a highly scalable key-value database platform on which you can run SQL queries using Drill. As a part of the 6.0 release, HPE Ezmeral Data Fabric Database natively supports indexes on secondary fields in JSON tables.
    - Drill Limitations
      Provides information about Drill limitations and solutions where applicable.
    - Vulnerability Reports
      Provides vulnerability information in relation to Drill.
  - Hadoop
  - HBase
  - HBase Client and HPE Ezmeral Data Fabric Database Binary Tables
  - HCatalog
  - Hive
  - HttpFS
  - Hue
  - Livy
    Apache Livy is primarily used to provide integration between Hue and Spark.
  - HPE Ezmeral Data Fabric Streams Clients and Tools
    Describes the supported HPE Ezmeral Data Fabric Streams tools and clients.
  - NiFi
    This topic provides an overview of Apache NiFi on HPE Ezmeral Data Fabric.
  - OTel
    This topic provides an overview of OpenTelemetry on HPE Ezmeral Data Fabric.
  - Ranger
  - Apache Spark
  - YARN
  - Zeppelin
- Maven and the HPE Ezmeral Data Fabric
  This section discusses topics associated with Maven and the HPE Ezmeral Data Fabric.
- Developer's Reference
  This section contains in-depth information for the developer.
- API Documentation
  HPE Ezmeral Data Fabric supports public APIs for file system, HPE Ezmeral Data Fabric Database, and HPE Ezmeral Data Fabric Streams. These APIs are available for application-development purposes.
Other Docs
This section contains release-independent information, including: Installer documentation, Ecosystem release notes, interoperability matrices, security vulnerabilities, and links to other Data Fabric version documentation.
Glossary
Definitions for commonly used terms in MapR Converged Data Platform environments.

Configuring Multiple Drill Clusters and Designating One Cluster as an OJAI Distributed Query Service

As of Core 6.0 and Drill 1.11, you can run operational queries through the OJAI Distributed Query Service, as well as analytical queries through Drill. If you want to run operational and analytical workloads in your cluster, you must configure multiple Drill clusters within the cluster and then configure a Drill cluster as the OJAI Distributed Query Service. Restricting each workload to its own cluster improves query performance.

NOTE

Installing Drill and the OJAI Distributed Query Service together through the Installer is not currently supported. Only one of these services running in the cluster is supported unless you manually install and configure multiple Drill clusters, as instructed here.

Data Distribution

If you install both Drill and the OJAI Distributed Query Service through the Installer, both workloads get processed across the entire cluster. When both services run together in the cluster, the system replicates data across the entire cluster, causing remote reads and impairing performance, which can lead to missed SLAs and memory issues.

Memory Allocation

The amount of memory allocated to Drill and the OJAI Distributed Query Service differ. By default, when you install Drill, 13 GB of memory is allocated to the Drillbit service running on a node:

8 GB direct
4 GB heap
1 GB core cache

The OJAI Distributed Query Service less memory than Drill. By default, the OJAI Distributed Query Service is allocated ~ 5 GB of memory:

1 GB direct
3 GB heap
512 MB core cache

If you use the Installer and select both Drill and the OJAI Distributed Query Service, memory is configured for Drill. If you only run operational queries, which do not use as much memory as analytical queries, you unnecessarily lose an additional 8 GB of memory.

How to Run Drill and the OJAI Distributed Query Service Together in a Cluster

You can manually install Drill on several nodes and divide the nodes into multiple topologies (Drill clusters). For each of the topologies, create and mount a volume. Then, create directories within each volume to store your data. Configure these directories as workspaces in the Drill dsf storage plugin. Finally, configure a Drill cluster to run as an OJAI Distributed Query Service.

The following topics provide instructions for each of the required steps:

Partners Support Dev-Hub Community ALA Privacy Policy Glossary

HPE Ezmeral Data Fabric – Customer-Managed 7.9.0 Documentation
Abstract	This site contains documentation for the customer-managed platform of the HPE Ezmeral Data Fabric version 7.9.0 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
Published	April 2025
Edition	7.9.0
Topic last updated	2020-07-09