Integrate Spark-SQL (Spark 2.0.1 and later) with Hive
You integrate Spark-SQL with Hive when you want to run Spark-SQL queries on Hive tables. This information is for Spark 2.0.1 or later users.
About this task
For information about Spark-SQL and Hive support, see Spark Feature Support.
NOTE
 If you installed Spark with the MapR Installer, the following steps are not required. Procedure
- 
            Copy the 
hive-site.xmlfile into theSPARK_HOME/confdirectory so that Spark and Spark-SQL recognize the Hive Metastore configuration. Do not create a symbolic link instead of copying the file. You may need to edit the file with settings that are specific to the Spark Thrift server. - 
            Add 
644permission to thehive-site.xmlusing the following command:sudo chmod 644 /opt/mapr/spark/spark-<sparkVersion>/conf/hive-site.xml - 
            If Hive is configured on Tez (not on MR), you must remove the Tez property from the
               Spark conf directory hive-site.xml. Delete this entry:
            
<property> <name>hive.execution.engine</name> <value>tez</value> </property> - 
            If Hive is configured on PAM, set 
"hive.metastore.sasl.enabled = true"in thehive-site.xmllocated in the Spark conf directory. - 
            Add the following additional properties to the
                  
/opt/mapr/spark/spark-<version>/conf/spark-defaults.conffile:Property Configuration Requirements spark.yarn.dist.files For Spark on YARN, specify the location of the hive-site.xmlfile:/opt/mapr/spark/spark-<spark-version>/conf/hive-site.xmlspark.sql.hive.metastore.version Specify the Hive version that you are using. NOTEIf you are using Hive Metastore 2.1, set the version to 1.2.1. - 
            Depending on whether you plan to run with impersonation, perform one of the
               following:
            
- Configure user impersonation. See Hive User Impersonation for the steps to configure impersonation in the Spark Thrift server.
 - Set 
hive.server2.enable.doAstofalsein thehive-site.xmlfile. 
 - 
            To verify the integration, run the following command as the mapr user or as a user
               that mapr impersonates:
            
<spark-home>/bin/run-example --master <master> [--deploy-mode <deploy-mode>] sql.hive.SparkHiveExampleThe master URL for the cluster is either spark://<host>:7077 or yarn. The deploy-mode is either client or cluster.
 
What to do next
NOTE
  The default port for both HiveServer 2 and the Spark Thrift server is 10000. Therefore,
            before you start the Spark Thrift server on a node where HiveServer 2 is running, verify
            that there is no port conflict.NOTE
 If you plan to access Hive tables that store data in HPE Data Fabric Database, you need to copy the
            Hive HBase handler jar into the Spark jars directory. For
            example:cp /opt/mapr/hive/hive-2.1/lib/hive-hbase-handler-2.1.1-mapr-1707.jar /opt/mapr/spark/spark-2.1.0/jars/