HDFS Configuration Options
Use the following parameters to configure the Kafka Connect for HPE Ezmeral Data Fabric Streams HDFS connector.
In standalone mode, specify the HDFS connector configuration in the quickstart-hdfs.properties file. You can also configure the offset storage location and the port for the REST interface, which are specified in the connect-standalone.properties file. See Configuring in Standalone Mode.
/opt/mapr/kafka-connect-hdfs/kafka-connect-hdfs-<version>/etc/kafka-connect-hdfs/quickstart-hdfs.properties
/opt/mapr/kafka/kafka-<version>/config/connect-standalone.properties
/opt/mapr/kafka/kafka-<version>/config/connect-distributed.properties
Parameter | Description |
---|---|
flush.size |
Number of records written to the file system before invoking file commits.
|
hdfs.url |
The file system connection URL. This configuration has the format of
|
connect.hdfs.keytab |
The path to the keytab file for the HDFS connector principal. This keytab file should only be readable by the connector user.
|
connect.hdfs.principal |
The principal used when the file system is using Kerberos for authentication.
|
format.class |
The format class used when writing data to the file system.
|
hadoop.conf.dir |
The Hadoop configuration directory.
|
hadoop.home |
The Hadoop home directory.
|
hdfs.authentication.kerberos |
Specifies whether the file system uses Kerberos for authentication.
|
hdfs.namenode.principal |
The Kerberos principal for CLDB.
|
hive.conf.dir |
The Hive configuration directory.
|
hive.database |
The database used when the connector creates tables in Hive.
|
hive.home |
The Hive home directory.
|
hive.integration |
Specifies whether Hive is integrated when running the connector.
|
hive.metastore.uris |
The Hive metastore URIs. Can be an IP address or fully-qualified domain name and port of the metastore host.
|
logs.dir |
Top-level file system directory to store the write ahead logs.
|
partitioner.class |
The partitioner used when writing data to the file system. You can use DefaultPartitioner, which preserves the Kafka partitions; FieldPartitioner, which partitions the data to different directories according to the value of the partitioning field specified in partition.field.name; TimeBasedPartitioner, which partitions data according to the time ingested to the file system.
|
rotate.interval.ms |
The time interval (milliseconds) before invoking file commits. This configuration ensures that file commits are invoked every configured interval. This configuration is useful when data ingestion rate is low and the connector didn't write enough messages to commit files. The default value -1 means that this feature is disabled.
|
schema.compatibility |
The schema compatibility rule used when the connector is observing schema changes. The supported configurations are NONE, BACKWARD, FORWARD and FULL.
|
topics |
A list of topics to use as input for the HDFS connector.
|
topics.dir |
Top-level file system directory to store the data ingested from Kafka.
|
locale |
The locale used when partitioning with TimeBasedPartitioner.
|
partition.duration.ms |
The duration of a partition (milliseconds) used by TimeBasedPartitioner. The default value -1 means that TimeBasedPartitioner is not being used.
|
partition.field.name |
The name of the partitioning field when FieldPartitioner is used.
|
path.format |
This configuration is used to set the format of the data directories when partitioning with TimeBasedPartitioner. The format set in this configuration converts the Unix timestamp to proper directories strings. For example, if you setpath.format='year'=YYYY/'month'=MM/'day'=dd/'hour'=HH/, the data directories will have the format /year=2015/month=12/day=07/hour=15
|
shutdown.timeout.ms |
Clean shutdown timeout. This makes sure that asynchronous Hive metastore updates are completed during connector shutdown.
|
timezone |
The timezone to use when partitioning with TimeBasedPartitioner.
|
filename.offset.zero.pad.width |
Sets the width to the zero-pad offsets in the file system file names. If the offsets are too short it provides fixed width filenames that can be ordered by simple lexicographic sorting.
|
kerberos.ticket.renew.period.ms |
The period in milliseconds to renew the Kerberos ticket.
|
retry.backoff.ms |
Used to notify Kafka Connect to retry delivering a message batch or performing recovery in case of transient exceptions. The retry backoff is in milliseconds.
|
schema.cache.size |
The sized of the schema cache used in the Avro converter.
|
storage.class |
The underlying storage layer. The default is MapR-FS.
|