Connectors, Tasks, and Workers
Describes how Kafka Connect for HPE Ezmeral Data Fabric Streams works and how connectors, tasks, offsets, and workers are associated.
Connectors
Connectors (or a connector instance) are logical jobs that are responsible for managing the copying of data between HPE Ezmeral Data Fabric Streams and another systems. Each connector instantiates a set of tasks that copies the data. By allowing the connector to break a single job into many tasks, support is built-in for parallelism and scalable data copying with very little configuration. Connector plugins are jars that add the classes that implement a connector.
Offsets
As connectors run, Kafka Connect tracks offsets for each one so that connectors can resume from their previous position in the event of failures or graceful restarts for maintenance. They track the current position in the stream of data being copied and because each connector may need to track many offsets for different partitions of the stream. For example, when loading data from a database, the offset might be a transaction ID that identifies a position in the database change log.
Users generally do not need to worry about the format of offsets, especially since they differ from connector to connector. However, Kafka Connect does require persistent storage for offset data to ensure it can recover from faults. This storage for offset data is configurable. See Standalone Worker Configuration Options and Distributed Worker Configuration Options.
Workers
- In standalone mode, the cluster consists of a single worker that is supplied with tasks that are useful for testing and debugging purposes.
- In distributed mode, the cluster consisting from multiple workers with the same group.id, offset.storage.topic, and config.storage.topic. Connector tasks are submitted via the Kafka Connect REST API.
The following list the location of the standalone and distributed worker configuration files:
/opt/mapr/kafka/kafka-<version>/config/connect-standalone.properties
/opt/mapr/kafka/kafka-<version>/config/connect-distributed.properties