Configuring the Hive Storage Plugin
About this task
Drill can work with only one version of Hive in a given cluster. To access Hive
tables using custom SerDes or InputFormat/OutputFormat, all nodes running Drill must
have the SerDes or InputFormat/OutputFormat JAR files in the
<drill_installation_directory>/jars/3rdparty
location.
To query across multiple versions of Hive, install each version of Hive on a separate Drill
cluster. You must define separate storage plugins, each corresponding to the specific
Hive version of the metastore.
NOTE
In EEP
6.0, Drill requires Hive version 2.3.3-mapr or later to successfully query
Hive data sources.Configuring a Hive Remote Metastore
A remote Hive metastore configuration runs as a separate service
outside of Hive. The metastore service communicates with the Hive database over JDBC.
Point Drill to the Hive metastore service address, and provide the connection parameters
in the Hive storage plugin configuration to configure a connection to Drill. The Hive
storage plugin (located on the Storage tab in the Drill Web UI)
has the following default configuration if you install Drill:
{
"type": "hive",
"enabled": true,
"configProps": {
"hive.metastore.uris": "",
"javax.jdo.option.ConnectionURL": "jdbc:derby:;databaseName=../sample-data/drill_hive_db;create=true",
"hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
"fs.default.name": "file:///",
"hive.metastore.sasl.enabled": "false",
"datanucleus.schema.autoCreateAll": "true"
}
}
Complete the following steps to modify the default Hive storage plugin configuration for your file system environment:
Procedure
- Verify that Hive is running.
-
Issue the following command to start the Hive metastore service on the system
specified in the
hive.metastore.uris
:hive --service metastore
- Start the Drill Web UI.
- Select the Storage tab. If Web UI security is enabled, you must have administrator privileges to perform this step.
- In the list of disabled storage plugins in the Drill Web UI, click Update next to Hive.
-
Update the following Hive storage plugin parameters to match the system
environment:
"hive.metstore.uris"
"jdbc:<database>://<host:port>/<metastore database>"
- Change the default location of files to suit your environment. For example,
change
"fs.default.name": "file:///"
to the file system location:maprfs:///
- To run Drill and Hive in a secure cluster,
change the
"hive.metastore.sasl.enabled"
parameter to"true"
. - Change the
"datanucleus.schema.autoCreateAll"
property setting for your system environment. After it is enabled,"datanucleus.schema.autoCreateAll"
initializes the Hive metastore schema.- In a production environment, remove the
"datanucleus.schema.autoCreateAll"
property from the Hive storage plugin configuration; the property is not required because the preferred schema information is already created for the Hive metastore service. - In a test environment with an embedded Hive metastore, you can disable
(set to
false
) this property after the first query on the Hive data source that you submit from Drill. Alternatively, use the Hive schema tool to initialize or upgrade the Hive metastore schema. Using the Hive schema tool is recommended for queries on transactional tables. Run theschematool
command as an initialization step:/opt/mapr/hive/hive-<version>/bin/schematool -dbType <databaseType> -initSchema
- In a production environment, remove the
- Click Enable in the Web UI to enable the Hive storage plugin configuration.