Spark 3.3.0.0 - 2210 (EEP 9.0.0) Release Notes
This section provides reference information, including new features, patches, and known issues for Spark 3.3.0.0.
The notes below relate specifically to the Hewlett Packard Enterprise Distribution for Apache Hadoop. For more information, you may also want to consult the open-source Spark 3.3.0 Release Notes
These release notes contain only Hewlett Packard Enterprise specific information and are not necessarily cumulative in nature. For information about how to use the release notes, see Ecosystem Component Release Notes.
Spark Version | 3.3.0.0 |
Release Date | October 2022 |
HPE Version Interoperability | See Component Versions for Released EEPs and EEP Components and OS Support. |
Source on GitHub | https://github.com/mapr/spark |
GitHub Release Tag | 3.3.0.0-eep-2210 |
Maven Artifacts | https://repository.mapr.com/maven/ |
Package Names | Navigate to https://package.ezmeral.hpe.com/releases/MEP/ and select your EEP and OS to view the list of package names. |
Hive Support
- Starting from Spark 3.1.2, Spark supports Hive 2.3.
Delta Lake Support
Spark 3.2.0 and later provides Delta Lake support on HPE Ezmeral Data Fabric. See Apache Spark Feature Support.
New in This Release
- For a complete list of new features, see the open-source Spark 3.3.0 Release Notes.
- Updated Spark to version 3.3.0.0.
- Updated Log4j to version 2.x.
- Updated Hadoop to version 3.x.
- CVE fixes.
- Bug fixes.
Fixes
This HPE release includes the following new fixes since the latest Spark release. For details, refer to the commit log for this project in GitHub.
GitHub Commit | Date (YYYY-MM-DD) | Comment |
d839714 | 2022/07/25 | MapR [SPARK-1006] Add support of SCRAM SASL mechanism to Spark |
f100550 | 2022/07/25 | MapR [SPARK-1062] Use Spark HD3 profile by default to build Spark package |
c5788a7 | 2022/07/25 | MapR [SPARK-1059] Thriftserver can't start on Core710+MEP900+Hadoop3 |
35db505 | 2022/07/25 | MapR [SPARK-1043] Spark uses hadoop 3.3 |
c7c2c21 | 2022/07/30 | [EZSPA-807] Failed integration rapids test with py4j.protocol.Py4JJavaError |
322a282 | 2022/08/01 | MapR [SPARK-1069] Spark fails without Kafka installed |
772ab90 | 2022/08/01 | MapR [SPARK-1074] Common user can't start spark-shell session since access denied |
043a7d9 | 2022/08/01 | MapR [SPARK-1079] MaprFs jar is present in Spark-3.2.0/Spark-3.3.0 |
d4f6af3 | 2022/08/02 | MapR [SPARK-1081] Spark job fails on cluster with hadoop3 |
ae31f23 | 2022/08/08 | MapR [SPARK-1071] Update Thrift in Spark-3.3.0 |
fcf7bf2 | 2022/08/09 | MapR [SPARK-1078] manageSSLKeys.sh fails when user is not part of group with same name as user |
980f17b | 2022/08/09 | MapR [SPARK-1075] Ranger hive authorizer should not be copied from hive's hive-site.xml to spark's one |
7e53348 | 2022/08/10 | MapR [SPARK-1080] Use log4j2 specific properties and adapt Mapr specific changes to log4j2 |
b5ed2db | 2022/08/17 | MapR [SPARK-1086] Pyspark start fails |
e1d2396 | 2022/08/17 | MapR [SPARK-1090] Spark hivesite-editor library is not present in jars |
8356763 | 2022/08/18 | [SPARK-1088] Write to parquet fails for Spark 3.3.0 |
a85b0b3 | 2022/08/22 | MapR [SPARK-1093] CVE-2018-14721 - jackson databind |
a173817 | 2022/08/22 | MapR [SPARK-1096] Spark default log is info |
b833dcd | 2022/08/23 | MapR [SPARK-1087] Spark default log is info |
8d689f5 | 2022/08/24 | MapR [SPARK-1097] Parallel jobs running under non mapr user causes errors with manageSSLKeys.sh |
e6320d3 | 2022/09/08 | MapR [SPARK-1101] Excessive logs for spark job |
e0b39f3 | 2022/09/08 | MapR [SPARK-1103] Excessive logs for spark beeline |
ae646ef | 2022/09/08 | MapR [SPARK-1094] Spark worker is started on unsecured port 8481 on slave node |
91fb4ac | 2022/09/09 | MapR [SPARK-1106] Regulate dependencies in dep-blacklist.txt via configure.sh |
a9f0fd7 | 2022/09/15 | MapR [SPARK-1108] Parallel jobs running causes errors with manageSSLKeys.sh |
4cb5b68 | 2022/09/19 | MapR[SPARK-1109] CVE fixes at Spark 3.3.0 EEP-9.0.0 |
ca519da | 2022/09/23 | MapR [SPARK-1106] Regulate dependencies in dep-blacklist.txt via configure.sh |
522d39d | 2022/09/27 | MapR [SPARK-1105] Connection to STS fails on cluster with FIPS |
40d11ec | 2022/10/07 | MapR [SPARK-1094] Spark worker is started on unsecured port 8481 on slave node |
Known Issues and Limitations
-
When you enable the SSL in a mixed (FIPS and non-FIPS) configuration, Spark application run fails. To run Spark applications, set
spark.ssl.ui.enabled
option tofalse
inspark-defaults.conf
configuration file. -
If you are using Spark SQL with Derby database without Hive or Hive Metastore installation, you will see the Java Runtime Exception. See Apache Spark Feature Support for workaround. Spark does not support
log4j1.2
logging on HPE Ezmeral Data Fabric. -
SPARK-1099: Non-mapr user is unable to insert values into Hive table by using Spark Thrift Server
- Symptoms:
-
Navigate to Spark Beeline as a non-mapr user and connect to Spark Thrift Server.
!connect jdbc:hive2://<node1.cluster.com>:2304/default;ssl=true;auth=maprsasl
Create a table:CREATE TABLE nonmaprctastest2 (key int); insert into table nonmaprctastest2 values 1, 2, 3;
The following error occurs:Caused by: java.lang.RuntimeException: Cannot create staging directory: 'maprfs:/user/hive/warehouse/nonmaprctastest2/.hive-staging_hive_2022-08-23_11-38-31_177_3217175113512758641-4': User mapruser1(user id 5001) has been denied access to create .hive-staging_hive_2022-08-23_11-38-31_177_3217175113512758641-4
- Cause:
- In Hive 2.x, permissions for all the tables in
maprfs:///user/hive/warehouse/
directory are set to777
. However, in Hive 3.x, permissions for table directories are set to755
. In EEP, Spark Thrift Server creates the table as a user who started the Spark Thrift Server. When Hive 3.x changes the user to the user who did not start he Spark Thrift Server, the user can no longer make write operation with tables. - Workaround:
- You can choose one of the following workarounds:
- After creating the Hive table, set permissions to
777
inmaprfs:///user/hive/warehouse
directory. - After creating the Hive table, set owner to the user who created the Hive table.
- Use HiveServer2 instead of Spark Thrift Server which uses impersonation.
- After creating the Hive table, set permissions to
Resolved Issues
- None.