Configure Hive Directories
You can configure the following Hive directories:
- Hive Scratch Directory
- Hive Warehouse Directory
- Hive Error Logs Directory
Hive Scratch Directory
In Hive 1.0, HPE Data Fabric configures the Hive scratch
directory to be /user/<user.name>/tmp/hive/<user.name>. In Hive
0.13, the default scratch directory is /user/<user.name>/tmp/hive. The hive user must
have write access to the /user folder.
To modify this parameter, perform one of the following operations:
- Set this parameter in the
hive-site.xml.Copy the hive.exec.scratchdir property elements from the $HIVE_HOME/conf/hive-default.xml.template file and paste them into an XML configuration element in the $HIVE_HOME/conf/hive-site.xml file. Then, modify the value elements for these directories in the hive-site.xml file. -
Set this parameter from the Hive shell. Example:
hive> set hive.exec.scratchdir=/myvolume/tmp
How Hive Handles Scratch Directories on HPE Data Fabric
When a query requires Hive to query existing tables and create data for new tables, Hive uses the following workflow:
- Create the query scratch directory
hive_<timestamp>_<randomnumber>under the Hive scratch directory. - Create the following directories as subdirectories of the scratch directory:
- Final query output directory. This directory's name takes the form
-ext-<number>. -
An output directory for each MapReduce application. These directories' names take the form
-mr-<number>.
- Final query output directory. This directory's name takes the form
- Hive executes the tasks, including MapReduce applications and loading data to the query output directory.
- Hive loads the data from output directory into a table. By default, the table's
directory is in the
/user/hive/warehousedirectory. You can configure this location with thehive.metastore.warehouse.dirparameter inhive-site.xml, unless the table DDL specifies a custom location. Hive renames the output directory to the table directory in order to load the output data to the table. - The scratch directories are automatically deleted after the query completes successfully.
HPE Data Fabric uses Administering Volumes, which are logical
units that enable you to apply policies to a set of files, directories, and sub-volumes.
When the output directory and the table directory are in different volumes, this
workflow involves moving data across volumes. Moving data across volumes is slower than
moving data within a volume. Therefore, HPE Data Fabric
sets hive.optimize.insert.dest.volume to true to
automatically create a scratch directory in the same volume as the target table.
Hive Warehouse Directory
Hive tables are stored in the Hive warehouse directory. By default, HPE Data Fabric configures the Hive warehouse directory to be
/user/hive/warehouse under the root volume. This default is defined
in the $HIVE_HOME/conf/hive-default.xml.template file.
To modify this parameter, perform one of the following operations:
- Set this parameter in the
hive-site.xml.Copy the hive.metastore.warehouse.dir property elements from the $HIVE_HOME/conf/hive-default.xml.template file and paste them into an XML configuration element in the $HIVE_HOME/conf/hive-site.xml file. Then, modify the value elements for these directories in the hive-site.xml file. - Set this parameter from the Hive shell.
Example:
hive> set hive.metastore.warehouse.dir=/myvolume/mydirectory
Hive Error Logs Directory
The log files are stored in
/opt/mapr/hive/hive-<version>/logs/<user> by default.
To modify the log location:
- Configure
hive.log.dirin$HIVE_HOME/conf/hive-log4j.propertiesfile. Example:hive.log.dir=<other_location> - Set the sticky bit on the new directory. Example:
chmod 1777 <other_location>