Spark 2.1.0 API Changes

This topic describes the public API changes that occurred between Apache Spark 2.0.1 and Spark 2.1.0.

For more information about Spark 2.1.0, see the Spark Release Notes and the Spark 2.1.0 API Documentation.

New API

  • The DataType API is now mostly stable. Please see InterfaceStability annotations for the classes you need.
  • Add the from_json and to_json functions to SQL.
  • StructType now accepts Python Dictionaries.
  • New ML algorithms have been added for Spark R.
  • SparkContext.addFile is now supported for SparkR.
  • SparkR now supports multinomial logistics regression.
  • MLlib supports MLR in DataFrames, LSH.
  • MLlib model loading is now backward-compatible with Spark 1.6.

Changed API

  • Parquet-MR is bumped to 1.8.1.
  • spark.sql.warehouse.dir now needs to be set before SparkSession creation and is shared between multiple SparkSessions.
  • Values generated by non-deterministic functions will not change after coalesce or union.
  • The default Locale for DateFormat/NumberFormat is now Locale.US.
  • Function SIZE returns -1 when its input parameter is null.