Mirroring Topics from the HPE Cluster to an Apache Kafka Cluster
You can use MirrorMaker to mirror data continuously from streams in HPE Ezmeral Data Fabric clusters to Apache Kafka clusters.
Prerequisites
- This procedure requires MirrorMaker to run from the HPE Ezmeral Data Fabric cluster. Verify that the mapr-kafka package is installed on the node that you choose to run MirrorMaker on.
- Configure the node as a
mapr
client. - Ensure that the ID of the user who runs MirrorMaker has the
consumeperm
permission on the stream.
About this task
After you start MirrorMaker, it launches a configurable number of consumer threads to read topics that are in a stream in a HPE Ezmeral Data Fabric cluster and a number of producers to write the messages from those topics into topics in an Apache Kafka cluster.
Before running MirrorMaker, you create a file that contains the required configuration parameters for the consumers that read from the stream in the HPE Ezmeral Data Fabric cluster. You also create a file that contains the required configuration parameters for the producers that publish to the Apache Kafka cluster. You point to these files in the MirrorMaker command.
To specify which topics you want to mirror, use the
whitelist
parameter to provide a Java-style regular
expression that matches the names of the topics that you want mirrored.
Procedure
-
Create a file that contains the required properties and values for consumers to
use. When you run MirrorMaker, you point to this file by using the
consumer.config
parameter.Property Description streams.record.strip.streampath
Set the value of this property to true. In messages that are written to streams, the names of topics include the paths and names of the streams in which those topics are located. Apache Kafka needs only the names of the topics. This parameter removes the path and name of the stream that the topics will be mirrored from. streams.consumer.default.stream
Specifies the path and name of the stream that the topics will be mirrored from. group.id
A unique string that identifies the consumer group the consumers started by MirroMaker belong to. -
Create a file that contains the required properties and values for producers to
use. When you run MirrorMaker, you point to this file by using the
producer.config
parameter.Property Description bootstrap.servers
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The producers will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form host1:port1,host2:port2,...
. Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down).producer.type
Specifies whether the messages are published asynchronously in batches or as data is received by producers. The values are async
andsync
.compression.codec
Specifies the compression codec for all messages that are generated by producers. The possible values are none
,gzip
,snappy
, andlz4
. -
Run MirrorMaker with this command to start mirroring topics from HPE Ezmeral Data Fabric Streams
to Apache Kafka:
Syntax
bin/kafka-mirror-maker.sh --consumer.config <File that lists consumer properties and values> --num.streams <Number of consumer threads> --producer.config <File that lists producer properties and values> --whitelist=<Java-style regular expression for specifying the topics to mirror>
Parameter Description consumer.config
The path and name of the file that lists the consumer properties and their values. new.consumer
Specifies to use consumers that use the Apache Kafka 0.90 API library. num.streams
Use this parameter to specify the number of mirror consumer threads to create. Note that if you start multiple mirror maker processes then you may want to look at the distribution of partitions on the source cluster. If the number of consumption streams is too high per mirror maker process, then some of the mirroring threads will be idle by virtue of the consumer rebalancing algorithm (if they do not end up owning any partitions for consumption). producer.config
The path and name of the file that lists the producer properties and their values. whitelist
A Java-style regular expression for specifying the topics to copy. Commas (',') are interpreted as the regex-choice symbol ('|'). This parameter is required.
Example
In this example, the file that lists the properties and values for the consumer that
will read messages from the topics in HPE Ezmeral Data Fabric Streams is named
consumers.props
. It contains this list:
streams.record.strip.streampath=true
streams.consumer.default.stream=/myStream
group.id=cg1
The file that lists the properties and values for the producers that will publish
messages to topics in Apache Kafka is named producers.props
. It
contains this list:
bootstrap.servers =10.10.83.93:9092
producer.type=sync
compression.codec=none
The topics to mirror all have names that begin with na_west
. When
running the command, we can use "na_west.*"
as the regular
expression to use for the whitelist parameter.
bin/kafka-mirror-maker.sh --new.consumer
--consumer.config consumers.props --num.streams 2 --producer.config producers.props
--whitelist="na_west.*"