Spark 2.1.0 API Changes
This topic describes the public API changes that occurred between Apache Spark 2.0.1 and Spark 2.1.0.
For more information about Spark 2.1.0, see the Spark Release Notes and the Spark 2.1.0 API Documentation.
New API
- The
DataType
API is now mostly stable. Please seeInterfaceStability
annotations for the classes you need. - Add the
from_json
andto_json
functions to SQL. StructType
now accepts Python Dictionaries.- New ML algorithms have been added for Spark R.
SparkContext.addFile
is now supported for SparkR.- SparkR now supports multinomial logistics regression.
- MLlib supports MLR in DataFrames, LSH.
- MLlib model loading is now backward-compatible with Spark 1.6.
Changed API
- Parquet-MR is bumped to 1.8.1.
spark.sql.warehouse.dir
now needs to be set before SparkSession creation and is shared between multiple SparkSessions.- Values generated by non-deterministic functions will not change after coalesce or union.
- The default
Locale
forDateFormat/NumberFormat
is nowLocale.US
. - Function
SIZE
returns -1 when its input parameter is null.