Saving an Apache Spark DStream to a HPE Ezmeral Data Fabric Database JSON Table
The HPE Ezmeral Data Fabric Database OJAI Connector for Apache Spark enables you to use HPE Ezmeral Data Fabric Database as a sink for Apache Spark DStreams.
NOTE
Saving of Apache Spark DStream to HPE Ezmeral Data Fabric Database JSON table is currently only supported in
Scala.The following API saves a DStream[OJAIDocument]
object to a HPE Ezmeral Data Fabric Database
table:
def saveToMapRDB(tablename: String, createTable: Boolean,
bulkInsert: Boolean, idFieldPath: String): Unit
The parameters are as follows:
Parameter | Default | Description |
tableName |
Not applicable | The name of the HPE Ezmeral Data Fabric Database table to which you are saving the DStream. |
createTable |
false |
Creates the table before saving the DStream. Note that if the table already exists
and createTable is set to true, the API throws an exception. |
idFieldPath |
_id |
Specifies the key to be used for the DStream. |
bulkInsert |
false |
Loads a group of streams simultaneously. bulkInsert is similar to
a bulk load in MapReduce. |
NOTE
The only required parameter for this function is tableName. All the others are optional.The following example creates a DStream object, converts it to a
DStream[OJAIDocument]
object, and then stores it in
HPE Ezmeral Data Fabric Database:
val clicksStream: DStream[String] = createKafkaStream(…)
clicksStream.map(MapRDBSpark.newDocument()).saveToMapRDB("/clicks", createTable=true)
NOTE
You must use the map(MapRDBSpark.newDocument())
API to convert the DStream
object to a DStream[OJAIDocument]
object.If
clicksStream
is a DStream of Strings, it can be saved to HPE Ezmeral Data Fabric Database using
the saveToMapRDB
API:
clicksStream.map(MapRDBSpark.newDocument(_)).saveToMapRDB("/clicks", createTable = true);
NOTE
To use the saveToMapRDB
API, you need to transform the DStream object to a
DStream[OJAIDocument]
by using the Apache Spark Map API.