Configuring Hive and Tez

About this task

To configure Hive on Tez, repeat the following steps on each node where you want to configure Hive on Tez. Tez mode for MR jobs is not compatible with all MR jobs, so do not set up the whole cluster to work on Tez.

There is a known issue related to the incomplete removal of previously installed Tez packages. The issue affects platforms on which Tez was installed but later removed using sudo apt-get remove mapr-tez. Because of Ubuntu-specific behavior and Tez source-code issues, the remove command removes Tez only partially in some installations. If this happens, an error is generated when you try to re-install Tez on Ubuntu, as described following in step 1. If you believe your installation might have this issue, you can prevent the error. Before performing the following steps, use the purge command to completely remove all previously installed Tez packages.

Procedure

  1. Install Tez if it is already not installed. To install Tez, run the following command:
    On CentOS / RedHat yum install mapr-tez
    On SLES zypper install mapr-tez
    On Ubuntu apt-get install mapr-tez
    NOTE
    Repeat this step on each node where you want Hive on Tez to be configured.
  2. Create the /apps/tez directory on Data Fabric file system.
    To create, run the following commands:
    hadoop fs -mkdir /apps
    hadoop fs -mkdir /apps/tez
  3. Upload the Tez libraries to the /tez directory on the Data Fabric file system.
    To upload, run the following commands:
    hadoop fs -put /opt/mapr/tez/tez-<version> /apps/tez
    hadoop fs -chmod -R 755 /apps/tez
  4. Verify the upload.
    To verify, run the following command:
    hadoop fs -ls /apps/tez/tez-<version>
  5. Set the Tez environment variables. To set, open the /opt/mapr/hive/hive-<version>/conf/hive-env.sh file, add the following lines, and save the file:
    export TEZ_CONF_DIR=/opt/mapr/tez/tez-<version>/conf
    export TEZ_JARS=/opt/mapr/tez/tez-<version>/*:/opt/mapr/tez/tez-<version>/lib/*
    export HADOOP_CLASSPATH=$TEZ_CONF_DIR:$TEZ_JARS:$HADOOP_CLASSPATH
    NOTE
    Repeat this step on each node where you want Hive on Tez to be configured.
  6. Configure Hive for Tez engine. To configure, open the /opt/mapr/hive/hive-<version>/conf/hive-site.xml file, add the following lines, and save the file.
    <property>
      <name>hive.execution.engine</name>
      <value>tez</value>
    </property>
    Add the hive.exec.pre.hooks, hive.exec.post.hooks, and hive.exec.failure.hooks properties with value org.apache.hadoop.hive.ql.hooks.ATSHook to use the Hive queries page in the Tez UI.
    NOTE
    Starting from EEP 7.1.0, the following execution-hooks properties are managed by running configure.sh command with -R option.
    <property>
      <name>hive.exec.pre.hooks</name>
      <value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
    </property>
    
    <property>
      <name>hive.exec.post.hooks</name>
      <value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
    </property>
    
    <property>
        <name>hive.exec.failure.hooks</name>
        <value>org.apache.hadoop.hive.ql.hooks.ATSHook</value>
    </property>
    NOTE
    Repeat this step on each node where you want Hive on Tez to be configured.
  7. Run configure.sh with the -R option.
    /opt/mapr/server/configure.sh -R
    NOTE
    Starting in EEP 6.0.1 and later, Tez should be configured by running the $MAPR_HOME/server/configure.sh script with the -R option.
  8. Configure Tez shuffle on a secured cluster:
    Refer to Tez Shuffle to configure SSL encryption on shuffle.