Saving an Apache Spark DataFrame to a HPE Ezmeral Data Fabric Database JSON Table
To save an Apache Spark DataFrame to a HPE Ezmeral Data Fabric Database, invoke the
saveToMapRDB
method on the DataFrame
object (Scala). This
returns a DataFrameWriter
object, from which you can invoke the
saveToMapRDB
method. For Java and Python, invoke the
saveToMapRDB
method on the MapRDBJavaSession
object or
SparkSession
object, respectively.
If a row with the same ID already exists, the savetoMapRDB
method updates or overwrites that row.
If you want an exception to be thrown in this case, you can use the insertToMapRDB
method.
import com.mapr.db.spark.sql._
df.write.saveToMapRDB("/tmp/userInfo")
For EEP 4.1.0 and later, you can directly
invoke the saveToMapRDB
method on the DataFrame
object:
def saveToMapRDB(tableName: String, idFieldPath : String = "_id", createTable: Boolean = false, bulkInsert:Boolean = false): Unit
import org.apache.spark.sql.SparkSession
import com.mapr.db.spark.sql._
val df = spark.loadFromMapRDB("/tmp/user_profiles")
df.saveToMapRDB(tableName, createTable = true)
For saving a DataFrame (Dataset<Row> ), apply the following method on a
MapRDBJavaSession
object:
def saveToMapRDB[T](df: DataFrame[T], tableName: String, idFieldPath: String, createTable: Boolean, bulkInsert: Boolean): Unit
import com.mapr.db.spark.sql.api.java.MapRDBJavaSession;
MapRDBJavaSession maprSession = new MapRDBJavaSession(sparkSession);
Dataset<Row> ds = maprSession.loadFromMapRDB("/tmp/user_profiles");
maprSession.saveToMapRDB(ds, "/tmp/userInfo");
For saving a DataFrame, apply the following method on a Dataframe:
def saveToMapRDB(dataframe, table_name, id_field_path = default_id_field, create_table = False, bulk_insert = False)
from pyspark.sql import SparkSession
df = spark.loadFromMapRDB("/tmp/user_profiles")
sparkSession.saveToMapRDB(df, table_name, create_table=True)