Loading Data into a DataFrame Using a Type Parameter
If the structure of your data maps to a class in your application, you can specify a type parameter when loading into a DataFrame.
Specify the application class as the type parameter in the load call. The load infers the schema from the class.
The following example creates a DataFrame with a Person
schema by passing
the Person
class as the type parameter in the load
call:
import org.apache.spark.sql.SparkSession
import com.mapr.db.spark.sql._
case class Address(Pin: Integer, street: String, city: String)
case class Person(_id: String,
First_name: String,
last_name: String,
Address: Address,
Interests: Seq[String])
val df = sparkSession.loadFromMapRDB[Person]("/tmp/user_profiles")
import com.mapr.db.spark.sql.api.java.MapRDBJavaSession;
import org.apache.spark.sql.SparkSession;
public static class Address implements Serializable {
private Integer pin;
private String street;
private String city;
public Integer getPin() { return pin; }
public void setPin(Integer pin) { this.pin = pin; }
public String getStreet() { return street; }
public void setStreet(String street) { this.street = street; }
public String getCity() { return city; }
public void setCity(String city) { this.city = city; }
}
public static class Person implements Serializable {
private String _id;
private String firstName;
private String lastName;
private Date dob;
private Seq<String> interests;
public String get_id() { return _id; }
public void set_id(String _id) { this._id = _id; }
public String getFirstName() { return firstName; }
public void setFirstName(String firstName) { this.firstName = firstName; }
public String getLastName() { return lastName; }
public void setLastName(String lastName) { this.lastName = lastName; }
public Date getDob() { return dob; }
public void setDob(Date dob) { this.dob = dob; }
public Seq<String> getInterests() { return interests; }
public void setInterests(Seq<String> interests) { this.interests = interests; }
}
MapRDBJavaSession maprSession = new MapRDBJavaSession(sparkSession);
Dataset<Row> df = maprSession.loadFromMapRDB(tableName, Person.class);
You must invoke the loadFromMapRDB
method on a SparkSession
or MapRDBJavaSession
object.
All fields in an
application bean class are nullable by default. The only circumstance in which the load
returns an InvalidSchema
exception is if the HPE Ezmeral Data Fabric Database table contains fields
not included in the bean class.
The resulting schema of the object is as follows:
df.printSchema()
----------------------------------
root
|-- _id: String (nullable = true)
|-- first_name: String (nullable = true)
|-- last_name: String (nullable = true)
|-- address: Struct (nullable = true)
| |-- Pin: integer (nullable = true)
| |-- street: string (nullable = true)
| |-- city: string (nullable = true)
|-- interests: array (nullable = true)
| |-- element: string (containsNull = true)