Spark 3.3.0.0 - 2210 (EEP 9.0.0) Release Notes

This section provides reference information, including new features, patches, and known issues for Spark 3.3.0.0.

The notes below relate specifically to the Hewlett Packard Enterprise Distribution for Apache Hadoop. For more information, you may also want to consult the open-source Spark 3.3.0 Release Notes

These release notes contain only Hewlett Packard Enterprise specific information and are not necessarily cumulative in nature. For information about how to use the release notes, see Ecosystem Component Release Notes.

Spark Version	3.3.0.0
Release Date	October 2022
HPE Version Interoperability	See Component Versions for Released EEPs and EEP Components and OS Support.
Source on GitHub	https://github.com/mapr/spark
GitHub Release Tag	3.3.0.0-eep-2210
Maven Artifacts	https://repository.mapr.com/maven/
Package Names	Navigate to https://package.ezmeral.hpe.com/releases/MEP/ and select your EEP and OS to view the list of package names.

Hive Support

Starting from Spark 3.1.2, Spark supports Hive 2.3.

Delta Lake Support

Spark 3.2.0 and later provides Delta Lake support on HPE Ezmeral Data Fabric. See Apache Spark Feature Support.

New in This Release

For a complete list of new features, see the open-source Spark 3.3.0 Release Notes.
- Updated Spark to version 3.3.0.0.
- Updated Log4j to version 2.x.
- Updated Hadoop to version 3.x.
- CVE fixes.
- Bug fixes.

Fixes

This HPE release includes the following new fixes since the latest Spark release. For details, refer to the commit log for this project in GitHub.

GitHub Commit	Date (YYYY-MM-DD)	Comment
d839714	2022/07/25	MapR [SPARK-1006] Add support of SCRAM SASL mechanism to Spark
f100550	2022/07/25	MapR [SPARK-1062] Use Spark HD3 profile by default to build Spark package
c5788a7	2022/07/25	MapR [SPARK-1059] Thriftserver can't start on Core710+MEP900+Hadoop3
35db505	2022/07/25	MapR [SPARK-1043] Spark uses hadoop 3.3
c7c2c21	2022/07/30	[EZSPA-807] Failed integration rapids test with py4j.protocol.Py4JJavaError
322a282	2022/08/01	MapR [SPARK-1069] Spark fails without Kafka installed
772ab90	2022/08/01	MapR [SPARK-1074] Common user can't start spark-shell session since access denied
043a7d9	2022/08/01	MapR [SPARK-1079] MaprFs jar is present in Spark-3.2.0/Spark-3.3.0
d4f6af3	2022/08/02	MapR [SPARK-1081] Spark job fails on cluster with hadoop3
ae31f23	2022/08/08	MapR [SPARK-1071] Update Thrift in Spark-3.3.0
fcf7bf2	2022/08/09	MapR [SPARK-1078] manageSSLKeys.sh fails when user is not part of group with same name as user
980f17b	2022/08/09	MapR [SPARK-1075] Ranger hive authorizer should not be copied from hive's hive-site.xml to spark's one
7e53348	2022/08/10	MapR [SPARK-1080] Use log4j2 specific properties and adapt Mapr specific changes to log4j2
b5ed2db	2022/08/17	MapR [SPARK-1086] Pyspark start fails
e1d2396	2022/08/17	MapR [SPARK-1090] Spark hivesite-editor library is not present in jars
8356763	2022/08/18	[SPARK-1088] Write to parquet fails for Spark 3.3.0
a85b0b3	2022/08/22	MapR [SPARK-1093] CVE-2018-14721 - jackson databind
a173817	2022/08/22	MapR [SPARK-1096] Spark default log is info
b833dcd	2022/08/23	MapR [SPARK-1087] Spark default log is info
8d689f5	2022/08/24	MapR [SPARK-1097] Parallel jobs running under non mapr user causes errors with manageSSLKeys.sh
e6320d3	2022/09/08	MapR [SPARK-1101] Excessive logs for spark job
e0b39f3	2022/09/08	MapR [SPARK-1103] Excessive logs for spark beeline
ae646ef	2022/09/08	MapR [SPARK-1094] Spark worker is started on unsecured port 8481 on slave node
91fb4ac	2022/09/09	MapR [SPARK-1106] Regulate dependencies in dep-blacklist.txt via configure.sh
a9f0fd7	2022/09/15	MapR [SPARK-1108] Parallel jobs running causes errors with manageSSLKeys.sh
4cb5b68	2022/09/19	MapR[SPARK-1109] CVE fixes at Spark 3.3.0 EEP-9.0.0
ca519da	2022/09/23	MapR [SPARK-1106] Regulate dependencies in dep-blacklist.txt via configure.sh
522d39d	2022/09/27	MapR [SPARK-1105] Connection to STS fails on cluster with FIPS
40d11ec	2022/10/07	MapR [SPARK-1094] Spark worker is started on unsecured port 8481 on slave node

Known Issues and Limitations

When you enable the SSL in a mixed (FIPS and non-FIPS) configuration, Spark application run fails. To run Spark applications, set spark.ssl.ui.enabled option to false in spark-defaults.conf configuration file.
If you are using Spark SQL with Derby database without Hive or Hive Metastore installation, you will see the Java Runtime Exception. See Apache Spark Feature Support for workaround. Spark does not support log4j1.2 logging on HPE Ezmeral Data Fabric.
SPARK-1099: Non-mapr user is unable to insert values into Hive table by using Spark Thrift Server
Symptoms:
Navigate to Spark Beeline as a non-mapr user and connect to Spark Thrift Server.
!connect jdbc:hive2://<node1.cluster.com>:2304/default;ssl=true;auth=maprsasl
Create a table:
CREATE TABLE nonmaprctastest2 (key int); insert into table nonmaprctastest2 values 1, 2, 3;
The following error occurs:
Caused by: java.lang.RuntimeException: Cannot create staging directory: 'maprfs:/user/hive/warehouse/nonmaprctastest2/.hive-staging_hive_2022-08-23_11-38-31_177_3217175113512758641-4': User mapruser1(user id 5001) has been denied access to create .hive-staging_hive_2022-08-23_11-38-31_177_3217175113512758641-4
Cause:

In Hive 2.x, permissions for all the tables in maprfs:///user/hive/warehouse/ directory are set to 777. However, in Hive 3.x, permissions for table directories are set to 755. In EEP, Spark Thrift Server creates the table as a user who started the Spark Thrift Server. When Hive 3.x changes the user to the user who did not start he Spark Thrift Server, the user can no longer make write operation with tables.

Workaround:
You can choose one of the following workarounds:
- After creating the Hive table, set permissions to 777 in maprfs:///user/hive/warehouse directory.
- After creating the Hive table, set owner to the user who created the Hive table.
- Use HiveServer2 instead of Spark Thrift Server which uses impersonation.

Resolved Issues

None.

HPE Ezmeral Data Fabric – Customer-Managed 7.9.0 Documentation
Abstract	This site contains documentation for the customer-managed platform of the HPE Ezmeral Data Fabric version 7.9.0 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
Published	April 2025
Edition	7.9.0
Topic last updated	2022-10-17