Configure Hive Directories
You can configure the following Hive directories:
- Hive Scratch Directory
- Hive Warehouse Directory
- Hive Error Logs Directory
Hive Scratch Directory
In Hive 1.0, HPE Ezmeral
Data Fabric configures the Hive scratch directory to be
/user/<user.name>/tmp/hive/<user.name>
. In Hive 0.13, the
default scratch directory is /user/<user.name>/tmp/hive. The hive user must have
write access to the /user folder.
To modify this parameter, perform one of the following operations:
- Set this parameter in the
hive-site.xml.
Copy the hive.exec.scratchdir property elements from the $HIVE_HOME/conf/hive-default.xml.template file and paste them into an XML configuration element in the $HIVE_HOME/conf/hive-site.xml file. Then, modify the value elements for these directories in the hive-site.xml file. -
Set this parameter from the Hive shell. Example:
hive> set hive.exec.scratchdir=/myvolume/tmp
How Hive Handles Scratch Directories on HPE Ezmeral Data Fabric
When a query requires Hive to query existing tables and create data for new tables, Hive uses the following workflow:
- Create the query scratch directory
hive_<timestamp>_<randomnumber>
under the Hive scratch directory. - Create the following directories as subdirectories of the scratch directory:
- Final query output directory. This directory's name takes the form
-ext-<number>
. -
An output directory for each MapReduce application. These directories' names take the form
-mr-<number>
.
- Final query output directory. This directory's name takes the form
- Hive executes the tasks, including MapReduce applications and loading data to the query output directory.
- Hive loads the data from output directory into a table. By default, the table's
directory is in the
/user/hive/warehouse
directory. You can configure this location with thehive.metastore.warehouse.dir
parameter inhive-site.xml
, unless the table DDL specifies a custom location. Hive renames the output directory to the table directory in order to load the output data to the table. - The scratch directories are automatically deleted after the query completes successfully.
HPE Ezmeral
Data Fabric uses Administering Volumes, which are logical
units that enable you to apply policies to a set of files, directories, and sub-volumes.
When the output directory and the table directory are in different volumes, this
workflow involves moving data across volumes. Moving data across volumes is slower than
moving data within a volume. Therefore, HPE Ezmeral
Data Fabric sets
hive.optimize.insert.dest.volume
to true
to
automatically create a scratch directory in the same volume as the target table.
Hive Warehouse Directory
Hive tables are stored in the Hive warehouse directory. By default, HPE Ezmeral
Data Fabric configures the Hive warehouse directory to be
/user/hive/warehouse
under the root volume. This default is defined
in the $HIVE_HOME/conf/hive-default.xml.template
file.
To modify this parameter, perform one of the following operations:
- Set this parameter in the
hive-site.xml.
Copy the hive.metastore.warehouse.dir property elements from the $HIVE_HOME/conf/hive-default.xml.template file and paste them into an XML configuration element in the $HIVE_HOME/conf/hive-site.xml file. Then, modify the value elements for these directories in the hive-site.xml file. - Set this parameter from the Hive shell.
Example:
hive> set hive.metastore.warehouse.dir=/myvolume/mydirectory
Hive Error Logs Directory
The log files are stored in
/opt/mapr/hive/hive-<version>/logs/<user>
by default.
To modify the log location:
- Configure
hive.log.dir
in$HIVE_HOME/conf/hive-log4j.properties
file. Example:hive.log.dir=<other_location>
- Set the sticky bit on the new directory. Example:
chmod 1777 <other_location>