Mirroring Topics from an Apache Kafka Cluster to the HPE Cluster
You can use MirrorMaker to mirror data continuously from Apache Kafka clusters to streams in HPE Ezmeral Data Fabric Streams clusters.
Prerequisites
- Because this procedure requires that MirrorMaker be run from the HPE Ezmeral Data Fabric cluster, ensure that the mapr-kafka package is installed on the node that you choose to run MirrorMaker from.
- Configure the node as a mapr client.
- Ensure that the ID of the user that runs MirrorMaker has the
produceperm
andtopicperm
permissions on the destination stream.
About this task
Alternatively, you can stop mirroring after you migrate the consumers and producers for your applications from your Apache Kafka cluster to your data-fabric cluster where the stream is located. See in Migrating Apache Kafka 0.9.0 Applications to HPE Ezmeral Data Fabric Streams for details. After you start MirrorMaker, it launches a configurable number of consumer threads to read topics that are in a Kafka cluster and a number of producers to write the messages from those topics into topics in HPE Ezmeral Data Fabric Streams.
Before running MirrorMaker, you create a file that contains the required configuration parameters for the consumers that read from the Apache Kafka cluster. You also create a file that contains the required configuration parameters for the producers that publish to the stream in the HPE Ezmeral Data Fabric cluster. You point to these files in the MirrorMaker command.
To specify which topics you want to mirror, use the
whitelist
parameter to provide a Java-style regular
expression that matches the names of the topics that you want mirrored.
Procedure
-
Create a file that contains the required properties and values for consumers to
use. When you run MirrorMaker, you point to this file by using the
consumer.config
parameter.The descriptions of these properties, except for the last, are taken from the documentation for Apache Kafka. The last property is not documented.Property Description group.id
A unique string that identifies the consumer group the consumers started by MirrorMaker belong to. bootstrap.servers
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form host1:port1,host2:port2,...
. Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down). -
Create a file that contains the required properties and values for producers to
use. When you run MirrorMaker, you point to this file by using the
producer.config
parameter.Property Description streams.producer.default.stream
Specifies the path and name of the stream in the HPE Ezmeral Data Fabric cluster that the topics will be mirrored to. auto.create.topics.enable
Set the value to true
. The producers will therefore be able to create topics in the destination stream automatically. -
Run MirrorMaker with this command to start mirroring topics from Apache Kafka
to HPE Ezmeral Data Fabric Streams:
Syntax
/opt/mapr/kafka/kafka-0.9.0/bin/kafka-mirror-maker.sh --consumer.config <File that lists consumer properties and values> --num.streams <Number of consumer threads> --producer.config <File that lists producer properties and values> --whitelist=<Java-style regular expression for specifying the topics to mirror>
Parameter Description consumer.config
The path and name of the file that lists the consumer properties and their values. num.streams
Use this option to specify the number of mirror consumer threads to create. Note that if you start multiple mirror maker processes then you may want to look at the distribution of partitions on the source cluster. If the number of consumption streams is too high per mirror maker process, then some of the mirroring threads will be idle by virtue of the consumer rebalancing algorithm (if they do not end up owning any partitions for consumption). producer.config
The path and name of the file that lists the producer properties and their values. whitelist
A Java-style regular expression for specifying the topics to copy. Commas (',') are interpreted as the regex-choice symbol ('|'). This parameter is required.
Example
In this example, the file that lists the properties and values for the consumers that
will read messages from the topics in Apache Kafka is named
consumers.props
. It contains this list:
group.id=cg1
bootstrap.servers=10.10.100.87:9093
shallow.iterator.enable=false
The file that lists the properties and values for the producers that will publish
messages to topics in HPE Ezmeral Data Fabric Streams is named
producers.props
. It contains this list:
streams.producer.default.stream=/newStream
auto.create.topics.enable=true
The topics to mirror all have names that begin with na_west
. When
running the command, we can use "na_west.*"
as the regular
expression to use for the whitelist
parameter.
Here is the command:
bin/kafka-mirror-maker.sh --consumer.config consumers.props
--num.streams 2 --producer.config producers.props --whitelist="na_west.*"