Configuring the Kafka Storage Plugin
To configure Kafka as a data source in Drill, update the
<drill_home>/jars/3rdParty
directory with the required JAR files, restart
Drill, and configure the kafka
storage plugin in the Drill Web UI.
Verify that the nodes in your cluster meet the requirements and then complete the steps listed.
Requirements
The Kafka storage plugin requires:
- HPE Ezmeral Data Fabric 7.0 or later cluster
- Drill 1.16.1 or later installed on nodes
- The HPE Ezmeral Data Fabric Kafka
client package (kafka-2.1.1, 2.6.1, or later) installed on at least one node. The Kafka
client installation provides the following kafka JAR files that you copy into the
<drill_home>/jars/3rdParty
directory (step 4):NOTEKafka 2.1.1 is used as an example. The version of your Kafka JAR files may differ.- Kafka-2.1.1
- kafka_2.11-2.1.1.200-mapr-710.jar
- kafka-clients-2.1.1.200-mapr-710.jar
- Kafka-2.6.1 (if you have eep-800 or later installed)
- kafka_2.13-2.6.1.0-eep-800.jar
- kafka-clients-2.6.1.0-eep-800.jar
- kafka-eventstreams-0.1.0.0-eep-800.jar
- Kafka-2.1.1
Steps
Complete the following steps to query Kafka Streams
from Drill:
NOTE
Do not perform step 2 if you installed Drill using the RPM or Debian packages. Step 2 is only required if you installed Drill using
a TAR file.- Remove the specified JAR files from the
<drill_home>/jars/3rdParty
directory based on the Drill installation method:- If you installed Drill using RPM or Debian
packages, only remove JAR files that start with kafka, such as
kafka-clients-<version>.jar
andkafka_<version>.jar
, from the<drill_home>/jars/3rdParty
directory. - If you installed Drill using a TAR file, remove all the JAR files that start with
mapr
andkafka
, such asmaprdb-<version>-mapr.jar, maprfs-<version>-mapr.jar
,kafka_<version>-mapr.jar
, andkafka-clients-<version>.jar
, from the<drill_home>/jars/3rdParty
directory.
- If you installed Drill using RPM or Debian
packages, only remove JAR files that start with kafka, such as
- (Only perform this step if you installed Drill using a TAR file.) Copy the following
JAR files from the
/opt/mapr/lib directory
into<drill_home>/jars/3rdParty
directory: - Copy the
mapr-streams-6.2.0.0-mapr.jar
file from the/opt/mapr/lib
directory into the<drill_home>/jars/3rdParty
directory. - Copy the following kafka JAR files from the
/opt/mapr/kafka/kafka-*/libs
directory into the<drill_home>/jars/3rdParty
directory:NOTEKafka 2.1.1 is used as an example. The version of your Kafka JAR files may differ.- Kafka-2.1.1
kafka_2.11-2.1.1.200-mapr-710.jar
kafka-clients-2.1.1.200-mapr-710.jar
- Kafka-2.6.1 (if you have eep-800 or later installed)
kafka_2.13-2.6.1.0-eep-800.jar
kafka-clients-2.6.1.0-eep-800.jar
kafka-eventstreams-0.1.0.0-eep-800.jar
- Kafka-2.1.1
- Issue the following command to restart
Drill:
$ maprcli node services -name drill-bits -action restart -nodes <node hostnames separated by a space>
- Log in to the Drill Web UI, and configure the kafka storage
plugin. See Kafka Storage Plugin for instructions.
NOTEWhen configuring the kafka storage plugin, you must also include the following parameter in the storage plugin configuration:
"streams.consumer.default.stream": "<path-to-stream>"
Usage Example
This example shows a Drill query on a Streams data set, which was made accessible to Drill through the kafka storage plugin.
For this example, tables that contain Yelp stream topics reside in a directory named
/YelpStream. The kakfa storage plugin is configured with the
streams.consumer.default.stream
parameter pointing to the /YelpStream
directory, as
shown:"streams.consumer.default.stream": "/YelpStream"
The USE command tells Drill to access data from only the kafka data
source:
use kafka;
+-----+----------------------------------+
| ok | summary |
+-----+----------------------------------+
| true | Default schema changed to [kafka] |
+-----+----------------------------------+
The SHOW TABLES command lists the tables in the /YelpStream directory configured for the
kafka data
source:
show tables;
+-------------+---------------------------+
| TABLE_SCHEMA | TABLE_NAME |
+-------------+---------------------------+
| kafka | /YelpStream:UserTable |
| kafka | /YelpStream:ReviewTable |
| kafka | /YelpStream:BusinessTable |
+-------------+---------------------------+
The query selects all the data from the BusinessTable in the
/YelpStream
directory, limiting the results to one row
data:select * from `/YelpStream:BusinessTable` limit 1;
+---+----------+-----------+----------+----+------------+-----+--------+---------+----+-------------+----+------------+-----+-----+----+----------+----------------+--------------+-----------------+-----------+
| _id | attributes | business_id | categories | city | full_address | hours | latitude | longitude | name | neighborhoods | open | review_count | stars | state | type | kafkaTopic | kafkaPartitionId | kafkaMsgOffset | kafkaMsgTimestamp | kafkaMsgKey |
+---+----------+-----------+----------+----+------------+-----+--------+---------+----+-------------+----+------------+-----+-----+----+----------+----------------+--------------+-----------------+-----------+
| --1emggGHgoG6ipd_RMb-g | {"Accepts Credit Cards":"true","Parking":{"garage":"false","lot":"true","street":"false","valet":"false","validated":"false"},"Price Range":"1","Ambience":{},"Good For":{},"Music":{}} | --1emggGHgoG6ipd_RMb-g | ["Food","Convenience Stores"] | Las Vegas | 3280 S Decatur Blvd
Westside
Las Vegas, NV 89102 | {"Friday":{},"Monday":{},"Saturday":{},"Sunday":{},"Thursday":{},"Tuesday":{},"Wednesday":{}} | 36.1305306 | -115.2072382 | Sinclair | ["Wes