Installing Hive

Prerequisites

To set up the EEP repository, see Step 11: Install Ecosystem Components Manually.

You can install Hive on a node in the Data Fabric cluster or on a Data Fabric client node. Installation of HiveServer2 (HS2) on a client node is not supported by the Data Fabric platform. If you wish to install HS2 on a client node, note that one or more required JAR files may not be installed during the installation of mapr-client. Copy the following JAR file from a resource manager node to the Data Fabric client node:

/opt/mapr/hadoop/hadoop-<X.X.X>/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-<X.X.X>-mapr-<YYYY>.jar

Here:

X.X.X	Refers to the version (for example, hadoop-3.3.4)
YYYY	Refers to the release tag of ecosystem component (for example, 2210)

About the Hive Packages

For a list of fixes and new features, see the Hive Release Notes.

Hive is distributed as the following packages:

Package	Description
`mapr-hive`	The core Hive package.
`mapr-hiveserver2`	The Hive package that enables HiveServer2 to be managed by -Warden, allowing you to start and stop HiveServer2 using maprcli or the Data Fabric Control System. The `mapr-hive` package is a dependency and will be installed if you install `mapr-hiveserver2`. At installation time, Hiveserver2 is started automatically.
`mapr-hivemetastore`	The Hive package that enables the Hive Metastore to be managed by Warden, allowing you to start and stop Hive Metastore using maprcli or the Data Fabric Control System. The `mapr-hive` package is a dependency and will be installed if you install `mapr-hivemetastore`. At installation time, the Hive Metastore is started automatically.
`mapr-hivewebhcat`	The Hive package that enables WebHCat to be managed by Warden, allowing you to start and stop WebHCat using maprcli or the Data Fabric Control System. The `mapr-hive` package is a dependency and will be installed if you install `mapr-hivewebhcat.` At installation time, the WebHCat is started automatically.

Make sure the environment variable JAVA_HOME is set correctly. For example:

# export JAVA_HOME=/usr/lib/jvm/java-7-sun

You can set these system variables by using the shell command line or by updating files such as /etc/profile or ~/.bash_profile. See the Linux documentation for more details about setting system environment variables.

NOTE

The Data Fabric cluster must be up and running before installing Hive.

Considerations for Ubuntu

On Ubuntu, while configuring the new version of Hive, you could have an issue caused by an incomplete removal of previously installed Hive packages. To avoid this issue, use the purge command for complete removal of all previously installed Hive packages.

Installing the Hive Packages

Execute the following commands as root or using sudo.

On each planned Hive node, install Hive packages:

To install Hive:

On RHEL	`yum install mapr-hive`
On SLES	`zypper install mapr-hive`
On Ubuntu	`apt-get install mapr-hive`

To install Hive and HiveServer2:

On RHEL	`yum install mapr-hive mapr-hiveserver2`
On SLES	`zypper install mapr-hive mapr-hiveserver2`
On Ubuntu	`apt-get install mapr-hive mapr-hiveserver2`

To install Hive, HiveServer2, and HiveMetastore:

On RHEL	`yum install mapr-hive mapr-hiveserver2 mapr-hivemetastore`
On SLES	`zypper install mapr-hive mapr-hiveserver2 mapr-hivemetastore`
On Ubuntu	`apt-get install mapr-hive mapr-hiveserver2 mapr-hivemetastore`

To install Hive, HiveServer2, HiveMetastore and WebHCat:

On RHEL

yum install mapr-hive mapr-hiveserver2 mapr-hivemetastore mapr-hivewebhcat

On SLES

zypper install mapr-hive mapr-hiveserver2 mapr-hivemetastore mapr-hivewebhcat

On Ubuntu

apt-get install mapr-hive mapr-hiveserver2 mapr-hivemetastore mapr-hivewebhcat

NOTE

Starting from EEP-5.0.2 and EEP-6.0.1+, you can use Apache Derby as the underlying database, but only for test purposes. To configure Hive on Derby DB, install all Hive packages (mapr-hive, mapr-hiveserver2 mapr-hivemetastore, and mapr-hivewebhcat), and run the configure.sh command, as described in Step 3 in this procedure.

CAUTION

Do not use datanucleus.schema.autoCreateAll for populating underlying databases. For details, see prohibited usage of datanucleus.schema.autoCreateAll property .

Configure the database for Hive Metastore. See Configuring Database for Hive Metastore.
Run configure.sh with the -R option.
```
/opt/mapr/server/configure.sh -R
```

Hive Executable

After Hive is installed, the executable is located at: /opt/mapr/hive/hive-<version>/bin/hive.

Considerations for JDK 17

See Considerations for Hive on JDK 17.

Considerations for Spark-Hive Compatibility

Some parquet files generated by the default Spark installation are not compatible with Hive.

If you are using Hive and Spark with the same dataset simultaneously, or if Hive needs to use data generated by Spark, do the following:

If Spark has not yet generated the parquet files, set the spark.sql.parquet.writeLegacyFormat option to true in the Spark configuration.
If Spark has already generated the parquet files without the compatibility option enabled, set the spark.sql.parquet.writeLegacyFormat option to true in the Spark configuration and regenerate the parquet files.

Hive can now work with the parquet files.

NOTE

See Spark Configuration in the Spark documentation for a detailed description of the configuration options.

Configuring Hive

See Hive User Impersonation for the steps to configure user impersonation for Hive and the Data Fabric cluster.

To configure Hive on Tez, see Configuring Hive and Tez.

HPE Ezmeral Data Fabric – Customer-Managed 7.9.0 Documentation
Abstract	This site contains documentation for the customer-managed platform of the HPE Ezmeral Data Fabric version 7.9.0 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
Published	April 2025
Edition	7.9.0
Topic last updated	2024-05-01