About Release 7.9.0
This site contains documentation for HPE Ezmeral Data Fabric release 7.9.0, including installation, configuration, administration, and reference content, as well as content for the associated ecosystem components and drivers.
7.9.0 Installation
This section contains information about installing HPE Ezmeral Data Fabric software. It also contains information about how to migrate data and applications from an Apache Hadoop cluster to a HPE Ezmeral Data Fabric cluster.
7.9.0 Data Fabric
HPE Ezmeral Data Fabric is the industry-leading data platform for AI and analytics that solves enterprise business needs.
7.9.0 Administration
This section describes how to manage the nodes and services that make up a cluster.
7.9.0 Development
This section contains information related to application development for Ezmeral ecosystem components and HPE Ezmeral Data Fabric products, including the file system, Database (Key-Value and JSON), and Event Streams.
- Application Development Process
  Before you start developing applications on the HPE Ezmeral Data Fabric platform, consider how you will get the data into the platform, the storage format of the data, the type of processing or modeling that is required, and how the data will be accessed.
- File Store and Apps
  The following sections provide information about accessing the File Store with C and Java applications.
- HPE Ezmeral Data Fabric Database and Apps
  This section contains information about developing client applications for JSON and key-value tables.
- Apache Kafka Wire Protocol Service
  HPE Ezmeral Data Fabric Streams supports Apache Kafka Wire Protocol Service. Apache Kafka Wire Protocol Service is a TCP/IP service that emulates a Kafka cluster backed by HPE Ezmeral Data Fabric Streams. The service makes it possible for Apache Kafka clients written in any programming language to access topics in HPE Ezmeral Data Fabric Streams.
- HPE Ezmeral Data Fabric Streams and Apps
  HPE Ezmeral Data Fabric Streams brings integrated publish and subscribe messaging to HPE Ezmeral Data Fabric.
- MapReduce and Apps
  This section contains information associated with developing YARN applications.
- Kubernetes Interfaces for Data Fabric
  This section describes how to leverage the capabilities of the Kubernetes Interfaces for Data Fabric.
- Ecosystem Components
  The following sections provide information about each open-source project that is supported by the HPE Ezmeral Data Fabric.
  - Ecosystem Packs
  - Apache Airflow
    This topic provides an overview of Apache Airflow on HPE Ezmeral Data Fabric.
  - AsyncHBase
  - Cascading
  - Apache Drill
  - Hadoop
  - HBase
  - HBase Client and HPE Ezmeral Data Fabric Database Binary Tables
  - HCatalog
  - Hive
  - HttpFS
  - Hue
  - Livy
    Apache Livy is primarily used to provide integration between Hue and Spark.
  - HPE Ezmeral Data Fabric Streams Clients and Tools
    Describes the supported HPE Ezmeral Data Fabric Streams tools and clients.
  - NiFi
    This topic provides an overview of Apache NiFi on HPE Ezmeral Data Fabric.
  - OTel
    This topic provides an overview of OpenTelemetry on HPE Ezmeral Data Fabric.
  - Ranger
  - Apache Spark
  - YARN
    - ResourceManager
      Describes the role of the ResourceManager.
    - ApplicationMaster
      Describes the role of the ApplicationMaster.
    - MapReduce Version 2
      Provides an overview of how MapReduce works.
    - How Applications Work in YARN
      Describes the data flow during application execution in YARN.
    - Direct Shuffle on YARN
      Explains the shuffle phase of a MapReduce application.
    - Apache Shuffle on YARN
      You can disable Direct Shuffle and enable Apache Shuffle by modifying the configuration options in the yarn-site.xml and mapred-site.xml files. This page describes how to configure Apache Shuffle for MapReduce applications.
    - Logging Options on YARN
      Describes the logging options that are available on YARN.
    - Support for ADLS
      Starting with MapR 6.1, you can use Azure Data Lake Store (ADLS) as a data source or destination for all applications.
      - Prerequisites for Using ADLS
        Setting up Azure Data Lake Store (ADLS) on the Azure portal enables you to access ADLS from any application.
      - Authenticating ADLS Account
        To access data stored in Azure Data Lake Store (ADLS), you must first authenticate your ADLS account using your ADLS credentials.
        Securely Providing ADLS Credentials
        You can provide your ADLS credentials securely by hiding the open, readable configuration on the command line using the Hadoop credential provider.
        Using ADLS for Data Input or Output
        You can use Azure Data Lake Store (ADLS) as a source or destination for your application data.
        Deleting Data from ADLS
        You can delete your data from Azure Data Lake Store (ADLS).
    - Configuring ATS 1.0 or 1.5 for Hadoop 3.3 (Required for Tez UI)
      Describes how to configure the YARN Application Timeline Server (ATS) 1.0 and 1.5 for Hadoop 3.3.x. You must complete this process in order to use the Tez UI.
    - Configuring ATS 2.0 for Hadoop 3.3
      Describes how to install and configure the YARN Application Timeline Server (ATS) 2.0 for Hadoop 3.3.
  - Zeppelin
- Maven and the HPE Ezmeral Data Fabric
  This section discusses topics associated with Maven and the HPE Ezmeral Data Fabric.
- Developer's Reference
  This section contains in-depth information for the developer.
- API Documentation
  HPE Ezmeral Data Fabric supports public APIs for file system, HPE Ezmeral Data Fabric Database, and HPE Ezmeral Data Fabric Streams. These APIs are available for application-development purposes.
Other Docs
This section contains release-independent information, including: Installer documentation, Ecosystem release notes, interoperability matrices, security vulnerabilities, and links to other Data Fabric version documentation.
Glossary
Definitions for commonly used terms in MapR Converged Data Platform environments.

Using ADLS for Data Input or Output

You can use Azure Data Lake Store (ADLS) as a source or destination for your application data.

Prerequisites

For general information about the features of ADLS, refer to the Azure Data Lake Store documentation.

For information about configuring ADLS as storage for a Hadoop cluster, refer to the official Apache documentation.

The Azure Data Lake Storage access path syntax is:

adl://<Account Name>.azuredatalakestore.net/

You can use ADLS the same way as you use file system, substituting an adl scheme instead of maprfs, hdfs, webhdfs, and so on.

Procedure

Create a directory and read data:

[mapr@node4 ~]$ hadoop fs -mkdir adl://<username>.azuredatalakestore.net/testdir

[mapr@node4 ~]$ hadoop fs -ls adl://<username>.azuredatalakestore.net/

Found 1 items
drwxr-xr-x - 9d3f4f74-8337-4dae-ad77-f63459438553 331c9f66-6875-4e13-a74f-458dd23e4bde 0 2018-04-16 09:09 
adl://<username>.azuredatalakestore.net/testdir

Put data into ADLS from your local file system:

[mapr@node4 ~]$ hadoop fs -put testfile adl://<username>.azuredatalakestore.net/testdir

[mapr@node4 ~]$ hadoop fs -ls adl://<username>.azuredatalakestore.net/testdir

Found 1 itemsrw-rr- 1 9d3f4f74-8337-4dae-ad77-f63459438553 331c9f66-6875-4e13-a74f-458dd23e4bde 0 2018-04-16 09:10 
adl://<username>.azuredatalakestore.net/testdir/testfile

Delete data from ADLS:

[mapr@node4 ~]$ hadoop fs -rm -r adl://<username>.azuredatalakestore.net/testdir
                        
[mapr@node4 ~]$ hadoop fs -ls adl://<username>.azuredatalakestore.net/

Run YARN jobs with your input and output stored in ADLS:

yarn jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0-mapr-1710-SNAPSHOT.jar wordcount 
adl://<username>.azuredatalakestore.net/testdir/testfile adl://<username>.azuredatalakestore.net/wordcountout

Partners Support Dev-Hub Community ALA Privacy Policy Glossary

HPE Ezmeral Data Fabric – Customer-Managed 7.9.0 Documentation
Abstract	This site contains documentation for the customer-managed platform of the HPE Ezmeral Data Fabric version 7.9.0 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
Published	April 2025
Edition	7.9.0
Topic last updated	2018-10-17