Spark 2.0.1-1703 Release Notes

The notes below relate specifically to the MapR Distribution for Apache Hadoop. You may also be interested in the open-source Spark 2.0.1 Release Notes.

Spark Version 2.0.1
Release Date April 2017
MapR Version Interoperability See EEP Components and OS Support.
Source on GitHub
GitHub Release Tag 2.1.0-mapr-1703
Maven Artifacts
Package Names See Package Names for Ecosystem Packs (EEPs)
API Changes for this Version See Spark API Changes.
NOTE For some important Spark limitations, See "Known Issues and Limitations" later in this release note.

New in This Release

This version of Spark supports integration with Hive. However, note the following exceptions:


This MapR release includes the following new fixes since the latest MapR Spark release. In addition, Spark 2.0.1-1703 includes backports of all the fixes contained in Apache Spark 2.0.2. For details, refer to the commit log for this project in GitHub.

GitHub Commit Number Date (YYYY-MM-DD) MapR Fix Number and Description
b5fdf9e 2017/03/01 Merge pull request #94 from mapr/mapr-26289-spark-2.0.1.
f75cad8 2017/03/01 Set default poll timeout to 120s.
1cf7251 2017/03/01 Added include-kafka-09 profile to Assembly.
c9c6030 2017/02/24 [MAPR-26060] Fixed case when mapr-streams make gaps in offsets (#91).
36debc8 2017/02/09 Merge pull request #89 from mapr/mapr-26076-spark-2.0.1.
ed262d0 2017/02/09 [SPARK-15844][CORE] HistoryServer doesn't come up if spark.authenticate = true.
674f9bd 2017/02/08 Merge pull request #86 from mapr/spark-2.0.2-porting.
529e51b 2017/02/08 Fixed version for Kafka 0.10 SQL.
e680ec2 2017/02/06 [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest.
a68148e 2017/02/06 [SPARK-18125][SQL][BRANCH-2.0] Fix a compilation error in codegen due to splitExpression.
316f706 2017/02/06 [SPARK-17849][SQL] Fix NPE problem when using grouping sets.
01f3743 2017/02/06 [SPARK-17693][SQL][BACKPORT-2.0] Fixed Insert Failure To Data Source Tables when the Schema has the Comment Field.
a996282 2017/02/06 [SPARK-17981][SPARK-17957][SQL][BACKPORT-2.0] Fix Incorrect Nullability Setting to False in FilterExec.
6d9dee4 2017/02/06 [SPARK-18189][SQL][FOLLOWUP] Move test from ReplSuite to prevent java.lang.ClassCircularityError.
cdd189c 2017/02/06 [SPARK-17337][SPARK-16804][SQL][BRANCH-2.0] Backport subquery related PRs.
681a839 2017/02/06 [SPARK-18200][GRAPHX][FOLLOW-UP] Support zero as an initial capacity in OpenHashSet.
cb68e70 2017/02/06 [SPARK-18200][GRAPHX] Support zero as an initial capacity in OpenHashSet.
42d7574 2017/02/06 [SPARK-18111][SQL] Wrong approximate quantile answer when multiple records have the minimum value(for branch 2.0).
95aeff9 2017/02/06 [SPARK-18160][CORE][YARN] spark.files & spark.jars should not be passed to driver in yarn mode.
37fcf10 2017/02/06 [SPARK-16796][WEB UI] Mask spark.authenticate.secret on Spark environ.
b1723aa 2017/02/06 [SPARK-18133][BRANCH-2.0][EXAMPLES][ML] Python ML Pipeline Exampl.
a7be955 2017/02/06 [SPARK-18144][SQL] logging StreamingQueryListener$QueryStartedEvent.
724a6e3 2017/02/06 [SPARK-18114][HOTFIX] Fix line-too-long style error from backport of SPARK-18114.
2f1aaa1 2017/02/06 [SPARK-18148][SQL] Misleading Error Message for Aggregation Without Window/GroupBy.
992d65f 2017/02/06 [SPARK-18189][SQL] Fix serialization issue in KeyValueGroupedDataset.
f481615 2017/02/06 [SPARK-18114][MESOS] Fix mesos cluster scheduler generage command option error.
07d3ffe 2017/02/06 [SPARK-18030][TESTS] Fix flaky FileStreamSourceSuite by not deleting the files.
5250480 2017/02/06 [SPARK-18143][SQL] Ignore Structured Streaming event logs to avoid breaking history server (branch 2.0).
bdf4511 2017/02/06 [SPARK-16312][FOLLOW-UP][STREAMING][KAFKA][DOC] Add java code snippet for Kafka 0.10 integration doc.
ecd62ed 2017/02/06 [SPARK-18164][SQL] ForeachSink should fail the Spark job if `process` throws exception.
6cab38c 2017/02/06 [SPARK-16963][SQL] Fix test "StreamExecution metadata garbage collection".
19d27ad 2017/02/06 [SPARK-17813][SQL][KAFKA] Maximum data per trigger.
6c079b9 2017/02/06 [SPARK-18132] Fix checkstyle.
9c149f4 2017/02/06 [SPARK-18009][SQL] Fix ClassCastException while calling toLocalIterator() on dataframe produced by RunnableCommand.
597b754 2017/02/06 [SPARK-16963][STREAMING][SQL] Changes to Source trait and related implementation classes.
38745a9 2017/02/06 [SPARK-13747][SQL] Fix concurrent executions in ForkJoinPool for SQL (branch 2.0).
aa8c453 2017/02/06 [SPARK-18104][DOC] Don't build KafkaSource doc.
6f62a53 2017/02/06 [SPARK-18063][SQL] Failed to infer constraints over multiple aliases.
a031493 2017/02/06 [SPARK-16304] LinkageError should not crash Spark executor.
3b01f41 2017/02/06 [SPARK-17733][SQL] InferFiltersFromConstraints rule never terminates for query.
67484f3 2017/02/06 [SPARK-18022][SQL] java.lang.NullPointerException instead of real exception when saving DF to MySQL.
0002f56 2017/02/06 [SPARK-16988][SPARK SHELL] spark history server log needs to be fixed to show https url when ssl is enabled.
b50e511 2017/02/06 [SPARK-18070][SQL] binary operator should not consider nullability when comparing input types.
be401c8 2017/02/06 [SPARK-17624][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maintenance.
c03b30f 2017/02/06 [SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch.
86e6db7 2017/02/06 [SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing.
62ecfdd 2017/02/06 [SPARK-18058][SQL] [BRANCH-2.0]Comparing column types ignoring Nullability in Union and SetOperation.
7d291d4 2017/02/06 [SPARKR][BRANCH-2.0] R merge API doc and example fix.
38c59da 2017/02/06 [SPARK-17123][SQL][BRANCH-2.0] Use type-widened encoder for DataFrame for set operations.
453a44c 2017/02/06 [SPARK-17698][SQL] Join predicates should not contain filter clauses.
0ed97fe 2017/02/06 [SPARK-17986][ML] SQLTransformer should remove temporary tables.
1ac5708 2017/02/06 [SPARK-16606][MINOR] Tiny follow-up to , to correct more instances of the same log message typo.
8049e1d 2017/02/06 [STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches.
1b55321 2017/02/06 [SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream.
f1fc622 2017/02/06 [SPARK-17929][CORE] Fix deadlock when CoarseGrainedSchedulerBackend reset.
a922ca4 2017/02/06 [SPARK-17926][SQL][STREAMING] Added json for statuses.
290ac5b 2017/02/06 [SPARK-17811] SparkR cannot parallelize data.frame with NA or NULL in Date columns.
a94a716 2017/02/06 [SPARK-18034] Upgrade to MiMa 0.1.11 to fix flakiness
1db928e 2017/02/06 [SPARKR] fix warnings
bbd260f 2017/02/06 [SPARK-17999][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD.
c4816ab 2017/02/06 [SPARK-18003][SPARK CORE] Fix bug of RDD zipWithIndex & zipWithUniqueId index value overflowing.
9c22c9d 2017/02/06 [SPARK-17989][SQL] Check ascendingOrder type in sort_array function rather than throwing ClassCastException.
ae60c75 2017/02/06 [SPARK-18001][DOCUMENT] fix broke link to SparkDataFrame.
f2b58bf 2017/02/06 [SPARK-17711][TEST-HADOOP2.2] Fix hadoop2.2 compilation error.
003b20c 2017/02/06 [SPARK-17731][SQL][STREAMING][FOLLOWUP] Refactored StreamingQueryListener APIs for branch-2.0.
9ad2ee7 2017/02/06 [SPARK-17841][STREAMING][KAFKA] drain commitQueue.
efcc529 2017/02/06 [MINOR][DOC] Add more built-in sources in
edbe6a6 2017/02/06 [SPARK-17711] Compress rolled executor log.
28d9c60 2017/02/06 [SPARK-17751][SQL][BACKPORT-2.0] Remove spark.sql.eagerAnalysis and Output the Plan if Existed in AnalysisException.
b8b951a 2017/02/06 [SQL][STREAMING][TEST] Follow up to remove Option.contains for Scala 2.10 compatibility.
78e5c84 2017/02/06 [SQL][STREAMING][TEST] Fix flaky tests in StreamingQueryListenerSuite.
3fbcb1f 2017/02/06 [SPARK-17731][SQL][STREAMING] Metrics for structured streaming for branch-2.0.
1a14c88 2017/02/06 Fix example of tf_idf with minDocFreq.
1bf46c0 2017/02/06 [SPARK-17892][SQL][2.0] Do Not Optimize Query in CTAS More Than Once #15048.
ea7ccbe 2017/02/06 [MINOR][SQL] Add prettyName for current_database function.
e627ac0 2017/02/06 [SPARK-17819][SQL][BRANCH-2.0] Support default database in connection URIs for Spark Thrift Server.
e97b8cc 2017/02/06 [SPARK-17953][DOCUMENTATION] Fix typo in SparkSession scaladoc.
beeb656 2017/02/06 [SPARK-17863][SQL] should not add column into Distinct.
3d6ab95 2017/02/06 [SPARK-17834][SQL] Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer.
00239e8 2017/02/06 minor doc fix for Row.scala.
9957c50 2017/02/06 [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once.
be58a9b 2017/02/06 [SPARK-16827][BRANCH-2.0] Avoid reporting spill metrics as shuffle metrics.
b064786 2017/02/06 [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice.
eb73c46 2017/02/06 [SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB.
8a5a689 2017/02/06 [SPARK-17884][SQL] To resolve Null pointer exception when casting from empty string to interval type.
4fb6c0c 2017/02/06 [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13.
dccbe82 2017/02/06 [SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing is bad.
22078b0 2017/02/06 [SPARK-17880][DOC] The url linking to `AccumulatorV2` in the document is incorrect.
904dc7b 2017/02/06 Fix hadoop.version in
7c94cc5 2017/02/06 [SPARK-17816][CORE][BRANCH-2.0] Fix ConcurrentModificationException issue in BlockStatusesAccumulator.
50d4eac 2017/02/06 [SPARK-17346][SQL][TESTS] Fix the flaky topic deletion in KafkaSourceStressSuite.
ea25634 2017/02/06 [SPARK-17738][TEST] Fix flaky test in ColumnTypeSuite.
95a7871 2017/02/06 [SPARK-17417][CORE] Fix # of partitions for Reliable RDD checkpointing.
784dd2f 2017/02/06 [SPARK-17832][SQL] TableIdentifier.quotedString creates un-parseable names when name contains a backtick.
dcdca00 2017/02/06 [SPARK-17806] [SQL] fix bug in join key rewritten in HashJoin.
f36c03b 2017/02/06 [SPARK-17782][STREAMING][BUILD] Add Kafka 0.10 project to build modules.
eb75678 2017/02/06 [SPARK-17346][SQL][TEST-MAVEN] Add Kafka source for Structured Streaming (branch 2.0).
c46948e 2017/02/06 [SPARK-17805][PYSPARK] Fix in when pass in list of paths.
cad3e53 2017/02/06 [SPARK-17612][SQL][BRANCH-2.0] Support `DESCRIBE table PARTITION` SQL syntax.
87e573f 2017/02/06 [SPARK-17792][ML] L-BFGS solver for linear regression does not accept general numeric label column types.
e1cdf30 2017/02/06 [SPARK-17750][SQL][BACKPORT-2.0] Fix CREATE VIEW with INTERVAL arithmetic.
08a30d9 2017/02/06 [SPARK-17803][TESTS] Upgrade docker-client dependency.
4a48d45 2017/02/06 [SPARK-17780][SQL] Report Throwable to user in StreamExecution.
67ee7ad 2017/02/06 [SPARK-17798][SQL] Remove redundant Experimental annotations in sql.streaming.
85d0dc1 2017/02/06 [SPARK-17643] Remove comparable requirement from Offset (backport for branch-2.0).
a255661 2017/02/06 [SPARK-17758][SQL] Last returns wrong result in case of empty partition.
07a30cb 2017/02/06 [SPARK-17778][TESTS] Mock SparkContext to reduce memory usage of BlockManagerSuite.
230b501 2017/02/06 [SPARK-17773][BRANCH-2.0] Input/Output] Add VoidObjectInspector.
8ae27fb 2017/02/06 [SPARK-17549][SQL] Only collect table size stat in driver for cached relation.
3fa5485 2017/02/06 [SPARKR][DOC] minor formatting and output cleanup for R vignettes.
13595fc 2017/02/06 [SPARK-17559][MLLIB] persist edges if their storage level is non in PeriodicGraphCheckpointer.
75d7369 2017/02/06 [SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentException in Thriftserver.
159c854 2017/02/06 [SPARK-17753][SQL] Allow a complex expression as the input a value based case statement.
ca37182 2017/02/06 [SPARK-17587][PYTHON][MLLIB] SparseVector __getitem__ should follow __getitem__ contract.
825c9e3 2017/02/06 [SPARK-17736][DOCUMENTATION][SPARKR] Update R README for rmarkdown,…
258b068 2017/02/06 [MINOR][DOC] Add an up-to-date description for default serialization during shuffling.
92cd75c 2017/02/06 Updated the following PR with minor changes to allow cherry-pick to branch-2.0.
60d2ac2 2017/02/06 [SPARK-17721][MLLIB][ML] Fix for multiplying transposed SparseMatrix with SparseVector.
e6d1fbe 2017/02/06 [SPARK-17672] Spark 2.0 history server web Ui takes too long for a single application.
90df14b 2017/02/06 [SPARK-17712][SQL] Fix invalid pushdown of data-independent filters beneath aggregates.
7120a46 2017/02/06 [SPARK-16343][SQL] Improve the PushDownPredicate rule to pushdown predicates correctly in non-deterministic condition.
539f476 2017/02/06 [MINOR][DOCS] Fix th doc. of spark-streaming with kinesis.
27de1d4 2017/01/05 Merge pull request #81 from mapr/mapr-25713.
8ea6501 2017/01/05 [MAPR-25713] Spark might try to load MapR Class Loader multiple times and fail.
7e9e5f4 2016/12/26 Merge pull request #80 from mapr/mapr-25638.
965975c 2016/12/26 [SPARK-18528][SQL] Fix a bug to initialise an iterator of aggregation buffer.
96b1fea 2016/12/12 Merge pull request #79 from mapr/mapr-25311.
c5f682b 2016/12/12 [MAPR-25311] Bump Spark dependencies after ECO-1611 release.

Known Issues and Limitations

  • Spark 2.0.1 does not support Spark Structured Streaming.
  • Full support of HPE Ezmeral Data Fabric Streams is available only on clusters with MapR 5.2 and later.
  • Spark is not able to submit jobs to YARN when the cluster is in "classic" mode, even if YARN is installed and configured.
  • MAPR-17271: On secure clusters, the MapR Control System (MCS) does not display links for Spark-Master and Spark-HistoryServer.
  • MAPR-25052: Spark Thrift Server does not start on clusters secured by MapR-SASL.
  • MAPR-26039: Spark does not propagate mapr_sec_enabled variable to Driver.
  • Spark versions up to and including 2.3.0 have the following security vulnerability:CVE-2018-1334 Apache Spark local privilege escalation vulnerability

Resolved Issues