Loading Data from HPE Data Fabric Database as an Apache Spark DataFrame

To load data from a HPE Data Fabric Database JSON table into an Apache Spark DataFrame, invoke the following API:

Scala
Java
Python

For loading as a DataFrame, apply the following method on a SparkSession object:

def loadFromMapRDB[T](tableName: String, 
              schema: StructType): DataFrame  
              
import com.mapr.db.spark.sql._

val df = sparkSession.loadFromMapRDB[T]("/tmp/user_profiles"): DataFrame

For loading as a DataFrame (Datasets of Row), apply the following method on a MapRDBJavaSession object:

def loadFromMapRDB(tableName: String, schema: StructType, sampleSize: Double):DataFrame
              
import com.mapr.db.spark.sql.api.java.MapRDBJavaSession;
import org.apache.spark.sql.SparkSession;
              
MapRDBJavaSession maprSession = new MapRDBJavaSession(spark);
maprSession.loadFromMapRDB("/tmp/user_profiles");

NOTE

Java supports only DataSets of Row (Dataset<Row>).

For loading as a DataFrame, apply the following method on a SparkSession object:

loadFromMapRDB(table_name, schema, sample_size)
              
from pyspark.sql import SparkSession
              
df = spark.loadFromMapRDB("/tmp/user_profiles")

NOTE

PySpark supports only DataFrames (Dataset<Row>).

NOTE

The only required parameter to the methods is tableName. All the others are optional.

This creates a DataFrame object corresponding to the HPE Data Fabric Database table specified by the tableName parameter.

Both DataFrames and HPE Data Fabric Database tables work with structured data. DataFrames need a fixed schema, whereas HPE Data Fabric Database allows for a flexible schema. When loading data into a DataFrame, you can map your data to a schema by specifying the schema parameter in the loadFromMapRDB call. You can also provide an application class as the type [T] parameter in the call. These two approaches are the preferred methods for loading data into DataFrames.

For data exploration use cases, you might not know the schema of your HPE Data Fabric Database table. For those situations, the HPE Data Fabric Database OJAI connector for Apache Spark can infer the schema by sampling data from the table.

Whenever possible, the HPE Data Fabric Database OJAI Connector for Apache Spark pushes projections and filters for better performance. This allows HPE Data Fabric Database to project and filter data before returning it to your client application.

The following subtopics describe these techniques.

HPE Data Fabric 8.0.0 Software Documentation
Abstract	This site contains documentation for HPE Data Fabric Software version 8.0.0 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
Published	March 2026
Edition	8.0.0