Installing Hive
This topic includes instructions for using package managers to download and install Hive from the EEP repository.
Prerequisites
To set up the EEP repository, see Step 11: Install Ecosystem Components Manually.
mapr-client. Copy the following JAR file from a resource manager
node to the Data Fabric client node:
/opt/mapr/hadoop/hadoop-<X.X.X>/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-<X.X.X>-mapr-<YYYY>.jar
Here:| X.X.X | Refers to the version (for example, hadoop-3.3.4) |
| YYYY | Refers to the release tag of ecosystem component (for example, 2210) |
About the Hive Packages
For a list of fixes and new features, see the Hive Release Notes.
Hive is distributed as the following packages:
| Package | Description |
|---|---|
mapr-hive
|
The core Hive package. |
mapr-hiveserver2
|
The Hive package that enables HiveServer2 to be managed by -Warden,
allowing you to start and stop HiveServer2 using maprcli or the Data Fabric Control System. The
mapr-hive package is a dependency and will be installed
if you install mapr-hiveserver2. At installation time,
Hiveserver2 is started automatically. |
mapr-hivemetastore
|
The Hive package that enables the Hive Metastore to be managed by
Warden, allowing you to start and stop Hive Metastore using maprcli or the
Data Fabric Control System. The
mapr-hive package is a dependency and will be installed
if you install mapr-hivemetastore. At installation time,
the Hive Metastore is started automatically. |
mapr-hivewebhcat |
The Hive package that enables WebHCat to be managed by Warden, allowing
you to start and stop WebHCat using maprcli or the Data Fabric Control System. The
mapr-hive package is a dependency and will be installed
if you install mapr-hivewebhcat. At installation time, the
WebHCat is started automatically. |
Make sure the environment variable JAVA_HOME is set correctly. For
example:
# export JAVA_HOME=/usr/lib/jvm/java-7-sun
You can set these system variables by using the shell command line or by updating files
such as /etc/profile or ~/.bash_profile. See the Linux
documentation for more details about setting system environment variables.
Considerations for Ubuntu
On Ubuntu, while configuring the new version of Hive, you could have an issue caused by
an incomplete removal of previously installed Hive packages. To avoid this issue, use
the purge command for complete removal of all previously installed Hive
packages.
Installing the Hive Packages
Execute the following commands as root or using sudo.
- On each planned Hive node, install Hive packages:
- To install Hive:
On RHEL yum install mapr-hiveOn SLES zypper install mapr-hiveOn Ubuntu apt-get install mapr-hive - To install Hive and HiveServer2:
On RHEL yum install mapr-hive mapr-hiveserver2On SLES zypper install mapr-hive mapr-hiveserver2On Ubuntu apt-get install mapr-hive mapr-hiveserver2 - To install Hive, HiveServer2, and HiveMetastore:
On RHEL yum install mapr-hive mapr-hiveserver2 mapr-hivemetastoreOn SLES zypper install mapr-hive mapr-hiveserver2 mapr-hivemetastoreOn Ubuntu apt-get install mapr-hive mapr-hiveserver2 mapr-hivemetastore - To install Hive, HiveServer2, HiveMetastore and WebHCat:
On RHEL yum install mapr-hive mapr-hiveserver2 mapr-hivemetastore mapr-hivewebhcatOn SLES zypper install mapr-hive mapr-hiveserver2 mapr-hivemetastore mapr-hivewebhcatOn Ubuntu apt-get install mapr-hive mapr-hiveserver2 mapr-hivemetastore mapr-hivewebhcat
NOTEStarting from EEP-5.0.2 and EEP-6.0.1+, you can use Apache Derby as the underlying database, but only for test purposes. To configure Hive on Derby DB, install all Hive packages (mapr-hive,mapr-hiveserver2mapr-hivemetastore, andmapr-hivewebhcat), and run theconfigure.shcommand, as described in Step 3 in this procedure.CAUTIONDo not usedatanucleus.schema.autoCreateAllfor populating underlying databases. For details, see prohibited usage ofdatanucleus.schema.autoCreateAllproperty . - To install Hive:
- Configure the database for Hive Metastore. See Configuring Database for Hive Metastore.
- Run
configure.shwith the-Roption./opt/mapr/server/configure.sh -R
Hive Executable
After Hive is installed, the executable is located at:/opt/mapr/hive/hive-<version>/bin/hive. Considerations for JDK 17
See Considerations for Hive on JDK 17, and JDK 21 or Higher.
Considerations for Spark-Hive Compatibility
Some parquet files generated by the default Spark installation are not compatible with Hive.
- If Spark has not yet generated the parquet files, set the
spark.sql.parquet.writeLegacyFormatoption totruein the Spark configuration. -
If Spark has already generated the parquet files without the compatibility option enabled, set the
spark.sql.parquet.writeLegacyFormatoption totruein the Spark configuration and regenerate the parquet files.
Configuring Hive
See Hive User Impersonation for the steps to configure user impersonation for Hive and the Data Fabric cluster.
To configure Hive on Tez, see Configuring Hive and Tez.