About Release 8.0.0
This site contains documentation for HPE Data Fabric release 8.0.0, including installation, configuration, administration, and reference content, as well as content for the associated ecosystem components and drivers.
8.0.0 Installation
This section contains information about installing HPE Data Fabric software. It also contains information about how to migrate data and applications from an Apache Hadoop cluster to a HPE Ezmeral Data Fabric cluster.
8.0.0 Upgrade
This section describes how to upgrade HPE Data Fabric software.
8.0.0 Data Fabric
HPE Data Fabric is the industry-leading data platform for AI and analytics that solves enterprise business needs.
8.0.0 Administration
This section describes how to manage the nodes and services that make up a cluster.
8.0.0 Development
This section contains information related to application development for Ezmeral ecosystem components and HPE Data Fabric products, including the file system, Database (Key-Value and JSON), and Event Streams.
- Application Development Process
  Before you start developing applications on the HPE Data Fabric platform, consider how you will get the data into the platform, the storage format of the data, the type of processing or modeling that is required, and how the data will be accessed.
- File Store and Apps
  The following sections provide information about accessing the File Store with C and Java applications.
- HPE Data Fabric Database and Apps
  This section contains information about developing client applications for JSON and key-value tables.
- Apache Kafka Wire Protocol Service
  HPE Data Fabric Streams supports Apache Kafka Wire Protocol Service. Apache Kafka Wire Protocol Service is a TCP/IP service that emulates a Kafka cluster backed by HPE Data Fabric Streams. The service makes it possible for Apache Kafka clients written in any programming language to access topics in HPE Data Fabric Streams.
- Model Context Protocol (MCP)
- HPE Data Fabric Streams and Apps
  HPE Data Fabric Streams brings integrated publish and subscribe messaging to HPE Data Fabric.
- MapReduce and Apps
  This section contains information associated with developing YARN applications.
- Kubernetes Interfaces for Data Fabric
  This section describes how to leverage the capabilities of the Kubernetes Interfaces for Data Fabric.
- Ecosystem Components
  The following sections provide information about each open-source project that is supported by the HPE Data Fabric.
  - Ecosystem Packs
  - Apache Airflow
    This topic provides an overview of Apache Airflow on HPE Data Fabric.
  - AsyncHBase
  - Cascading
  - Apache Drill
  - Apache Flink
  - Hadoop
  - HBase
  - HBase Client and HPE Data Fabric Database Binary Tables
  - HCatalog
  - Hive
  - HttpFS
  - Hue
  - Livy
    Apache Livy is primarily used to provide integration between Hue and Spark.
  - HPE Data Fabric Streams Clients and Tools
    Describes the supported HPE Data Fabric Streams tools and clients.
  - NiFi
    This topic provides an overview of Apache NiFi on HPE Data Fabric.
  - OTel
    This topic provides an overview of OpenTelemetry on HPE Data Fabric.
  - Apache Polaris
  - Ranger
  - Apache Spark
    - Getting Started with Spark Interactive Shell
      After you have a basic understanding of Apache Spark and have it installed and running on your cluster, you can use it to load datasets, apply schemas, and query data from the Spark interactive shell.
    - Apache Spark Feature Support
      HPE Data Fabric supports most Apache Spark features. However, there are some exceptions.
    - Iceberg Support
      Describes support for Iceberg in HPE Data Fabric 7.6.x.
    - Spark Standalone
    - Spark on YARN
    - Spark configure.sh
      Starting in the EEP 4.0 release, run configure.sh -R to complete your Spark configuration when manually installing Spark or upgrading to a new version.
    - Spark SQL Thrift Server
      Spark SQL Thrift (Spark Thrift) was developed from Apache Hive HiveServer2 and operates like HiveSever2 Thrift server.
    - Spark History Server SSL
      Describes how to enable SSL for Spark History Server.
    - HPE Data Fabric Database Connectors for Apache Spark
      This section describes the HPE Data Fabric Database connectors that you can use with Apache Spark.
      - Understanding the HPE Data Fabric Database OJAI Connector for Spark
        Using the HPE Data Fabric Database OJAI connector for Spark enables you build real-time and batch pipelines between your data and HPE Data Fabric Database JSON. Before getting started, it is important that you understand Spark terminology and workflow, system requirements and support, and OJAI connector and API features.
        Configuring the HPE Data Fabric Database OJAI Connector for Apache Spark
        Before using the HPE Data Fabric Database OJAI Connector for Apache Spark, you must edit the pom.xml file for your project.
        Loading Data from HPE Data Fabric Database Using the HPE Data Fabric Database OJAI Connector for Apache Spark
        The HPE Data Fabric Database OJAI Connector for Apache Spark supports loading data as an Apache Spark RDD. Starting in the EEP 4.0 release, the connector introduces support for Apache Spark DataFrames and Datasets. DataFrames and Datasets perform better than RDDs. Whether you load your HPE Data Fabric Database data as a DataFrame or Dataset depends on the APIs you prefer to use. It is also possible to convert an RDD to a DataFrame.
        Working with Complex JSON Document Types
        The HPE Data Fabric Database OJAI Connector for Apache Spark provides APIs to process JSON documents loaded from HPE Data Fabric Database.
        Saving Data to a HPE Data Fabric Database JSON Table
        The HPE Data Fabric Database OJAI Connector for Apache Spark provides an API to save an Apache Spark RDD to a HPE Data Fabric Database JSON table. Starting in the EEP 4.0 release, the connector introduces support for saving Apache Spark DataFrames and DStreams to HPE Data Fabric Database JSON tables.
        Saving an Apache Spark RDD to a HPE Data Fabric Database JSON Table
        Saving an Apache Spark DataFrame to a HPE Data Fabric Database JSON Table
        Inserting an Apache Spark DataFrame into a HPE Data Fabric Database JSON Table
        Starting in the EEP 4.1.0 release, you can use the insertToMapRDB API to insert an Apache Spark DataFrame into a Data Fabric Database JSON table in Python. The insertToMapRDB API throws an exception if a row with the same ID already exists.
        Using Alternate Write Modes for HPE Data Fabric Database OJAI Connector
        You can use alternate write modes supported by Data Fabric Database OJAI Connector for Apache Spark to save an Apache Spark DataFrame to a Data Fabric Database JSON table.
        Saving an Apache Spark DStream to a HPE Data Fabric Database JSON Table
        Saving an Apache Spark Dataset to a HPE Data Fabric Database JSON Table
        Using Serialization with the HPE Data Fabric Database OJAI Connector for Apache Spark
        In the context of the HPE Data Fabric Database OJAI Connector for Apache Spark, serialization refers to the methods that read and write objects into bytes. This section describes how to configure your application to use a more efficient serializer.
      - HPE Data Fabric Database Binary Connector for Apache Spark
        This section describes the three main interaction points between Spark and HBase APIs and provides examples for each interaction point.
    - Integrating Spark
      This section includes the following topics about configuring Spark to work with other ecosystem components.
    - Spark JDBC and ODBC Drivers
      Data Fabric provides JDBC and ODBC drivers so you can write SQL queries that access the Apache Spark data-processing engine. This section describes how to download the drivers, and install and configure them.
    - Spark API Changes
      This topic describes the public API changes that occurred for specific Spark versions.
    - Structured Streaming in Spark
      Starting in EEP 5.0.0, structured streaming is supported in Spark.
    - PAM Authentication for Spark
      Spark supports PAM authentication on secure Data Fabric clusters.
    - Read or Write LZO Compressed Data for Spark
      This topic provides details for reading or writing LZO compressed data for Spark.
    - Ports Used by Spark
      To run a Spark job from a client node, ephemeral ports should be opened in the cluster for the client from which you are running the Spark job.
    - ACL Configuration for Spark
      Starting in the EEP 6.0 release, the ACL configuration for Spark is disabled by default.
  - YARN
  - Zeppelin
- Maven and the HPE Data Fabric
  This section discusses topics associated with Maven and the HPE Data Fabric.
- Developer's Reference
  This section contains in-depth information for the developer.
- API Documentation
  HPE Data Fabric supports public APIs for file system, HPE Data Fabric Database, and HPE Data Fabric Streams. These APIs are available for application-development purposes.
Other Docs
This section contains release-independent information, including: Installer documentation, Ecosystem release notes, interoperability matrices, security vulnerabilities, and links to other Data Fabric version documentation.
Glossary
Definitions for commonly used terms in MapR Converged Data Platform environments.

Using Alternate Write Modes for HPE Data Fabric Database OJAI Connector

You can use alternate write modes supported by Data Fabric Database OJAI Connector for Apache Spark to save an Apache Spark DataFrame to a Data Fabric Database JSON table.

Normally, the Apache Spark DataFrameWriter class supports the following write modes:

Append
Overwrite
ErrorIfExists
Ignore

The HPE Data Fabric Database OJAI Connector for Apache Spark returns an OperationNotSupported exception if you attempt to use one of these modes. The following example returns the error:

Scala

import org.apache.spark.sql.SaveMode
import com.mapr.db.spark.sql._

df.write.mode(SaveMode.Append).saveToMapRDB("/tmp/userInfo")

The HPE Data Fabric Database OJAI Connector for Apache Spark provides the following alternative modes:

Insert: Inserts the data into the HPE Data Fabric Database table. Throws a DBException if a row with same _id value already exists in the table.
Overwrite: Overwrites the data in the table with the current DataFrame data. This operation drops the table and creates a new table with the data.
ErrorIfExists: Returns an exception (TableExistsException) if the table already exists. Otherwise, creates the table and inserts the data.
Ignore: Ignores the data in the table if the table already exists. Otherwise, creates the table and inserts the data.
InsertOrReplace: Replaces the row with the row in the DataFrame, if a row with the same _id already exists in the table. Otherwise, inserts the new row.

You cannot specify these modes using the Apache Spark SaveMode method. Doing so results in the same OperationNotSupported exception noted earlier. To use these modes, you must call the option method on a DataFrameWriter object. The following example sets the Insert mode:

Scala

df.write.option("Operation", "Insert").saveToMapRDB("/tmp/usersInfo")

NOTE

The UPDATE mode for HPE Data Fabric Database OJAI Connector is not supported and it results in an OperationNotSupported exception.

Partners Support Dev-Hub Community ALA Privacy Policy Glossary

HPE Data Fabric 8.0.0 Software Documentation
Abstract	This site contains documentation for HPE Data Fabric Software version 8.0.0 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
Published	November 2025
Edition	8.0.0
Topic last updated	2024-06-20