Mirroring Topics from the HPE Cluster to an Apache Kafka
Cluster
You can use MirrorMaker to mirror data continuously from streams in HPE Data Fabric
clusters to Apache Kafka clusters.
Prerequisites
This procedure requires MirrorMaker to run from the HPE Data Fabric cluster.
Verify that the mapr-kafka package is installed on the node that you choose to
run MirrorMaker on.
Configure the node as a mapr client.
Ensure that the ID of the user who runs MirrorMaker has the
consumeperm permission on the stream.
About this task
After you start MirrorMaker, it launches a configurable number of consumer threads to
read topics that are in a stream in a HPE Data Fabric cluster and a number of producers to write the
messages from those topics into topics in an Apache Kafka cluster.
Figure 1. Mirroring from HPE Data Fabric Streams to Apache Kafka
Before running MirrorMaker, you create a file that contains the required
configuration parameters for the consumers that read from the stream in the HPE Data Fabric cluster. You
also create a file that contains the required configuration parameters for the
producers that publish to the Apache Kafka cluster. You point to these files in the
MirrorMaker command.
To specify which topics you want to mirror, use the
whitelist parameter to provide a Java-style regular
expression that matches the names of the topics that you want mirrored.
Procedure
Create a file that contains the required properties and values for consumers to
use. When you run MirrorMaker, you point to this file by using the
consumer.config parameter.
Property
Description
streams.record.strip.streampath
Set the value of this property to true. In messages that
are written to streams, the names of topics include the
paths and names of the streams in which those topics are
located. Apache Kafka needs only the names of the topics.
This parameter removes the path and name of the stream that
the topics will be mirrored from.
streams.consumer.default.stream
Specifies the path and name of the stream that the topics
will be mirrored from.
group.id
A unique string that identifies the consumer group the
consumers started by MirroMaker belong to.
Create a file that contains the required properties and values for producers to
use. When you run MirrorMaker, you point to this file by using the
producer.config parameter.
Property
Description
bootstrap.servers
A list of host/port pairs to use for establishing the
initial connection to the Kafka cluster. The producers will
make use of all servers irrespective of which servers are
specified here for bootstrapping—this list only impacts the
initial hosts used to discover the full set of servers. This
list should be in the form
host1:port1,host2:port2,.... Since
these servers are just used for the initial connection to
discover the full cluster membership (which may change
dynamically), this list need not contain the full set of
servers (you may want more than one, though, in case a
server is down).
producer.type
Specifies whether the messages are published
asynchronously in batches or as data is received by
producers. The values are async and
sync.
compression.codec
Specifies the compression codec for all messages that are
generated by producers. The possible values are
none, gzip,
snappy, and
lz4.
Run MirrorMaker with this command to start mirroring topics from HPE Data Fabric Streams
to Apache Kafka:
Syntax
bin/kafka-mirror-maker.sh
--consumer.config <File that lists consumer properties and values>
--num.streams <Number of consumer threads>
--producer.config <File that lists producer properties and values>
--whitelist=<Java-style regular expression for specifying the topics to mirror>
Parameter
Description
consumer.config
The path and name of the file that lists the consumer
properties and their values.
new.consumer
Specifies to use consumers that use the Apache Kafka 0.90
API library.
num.streams
Use this parameter to specify the number of mirror
consumer threads to create. Note that if you start multiple
mirror maker processes then you may want to look at the
distribution of partitions on the source cluster. If the
number of consumption streams is too high per mirror maker
process, then some of the mirroring threads will be idle by
virtue of the consumer rebalancing algorithm (if they do not
end up owning any partitions for consumption).
producer.config
The path and name of the file that lists the producer
properties and their values.
whitelist
A Java-style regular expression for specifying the topics
to copy. Commas (',') are interpreted as the regex-choice
symbol ('|').
This parameter is
required.
Example
In this example, the file that lists the properties and values for the consumer that
will read messages from the topics in HPE Data Fabric Streams is named
consumers.props. It contains this list:
The file that lists the properties and values for the producers that will publish
messages to topics in Apache Kafka is named producers.props. It
contains this list:
The topics to mirror all have names that begin with na_west. When
running the command, we can use "na_west.*" as the regular
expression to use for the whitelist parameter.