Spark 2.1.0 API Changes
This topic describes the public API changes that occurred between Apache Spark 2.0.1 and Spark 2.1.0.
For more information about Spark 2.1.0, see the Spark Release Notes and the Spark 2.1.0 API Documentation.
New API
- The
DataTypeAPI is now mostly stable. Please seeInterfaceStabilityannotations for the classes you need. - Add the
from_jsonandto_jsonfunctions to SQL. StructTypenow accepts Python Dictionaries.- New ML algorithms have been added for Spark R.
SparkContext.addFileis now supported for SparkR.- SparkR now supports multinomial logistics regression.
- MLlib supports MLR in DataFrames, LSH.
- MLlib model loading is now backward-compatible with Spark 1.6.
Changed API
- Parquet-MR is bumped to 1.8.1.
spark.sql.warehouse.dirnow needs to be set before SparkSession creation and is shared between multiple SparkSessions.- Values generated by non-deterministic functions will not change after coalesce or union.
- The default
LocaleforDateFormat/NumberFormatis nowLocale.US. - Function
SIZEreturns -1 when its input parameter is null.