Mirroring Topics with Apache Kafka MirrorMaker

Mirroring is a type of replication that takes place in this sequence of steps:

Messages that are published to topics in a source cluster are read by consumers that MirrorMaker manages.
These consumers send the messages to producers that MirrorMaker also manages.
The producers publish the messages in topics that are in the destination cluster.

Mirroring can continue indefinitely. Alternatively, you can mirror your data as a way of migrating it from Apache Kafka to HPE Ezmeral Data Fabric Streams. If you use it for this purpose, you can stop mirroring after migrating your producers and consumers to use HPE Ezmeral Data Fabric Streams, as described in Migrating Apache Kafka 0.9.0 Applications to HPE Ezmeral Data Fabric Streams.

ATTENTION

MirrorMaker does not provide the same reliability guarantees as the replication features in HPE Ezmeral Data Fabric Streams. In particular, MirrorMaker does not replicate cursors or message positions, which makes disaster recovery much more difficult than with replication of HPE Ezmeral Data Fabric Streams. Therefore, HPE Ezmeral Data Fabric recommends MirrorMaker for use only for mirroring between HPE Ezmeral Data Fabric Streams and Apache Kafka, not for replication of HPE Ezmeral Data Fabric Streams.

Prerequisites

Ensure that the destination stream in the HPE Ezmeral Data Fabric cluster exists. To create a stream, run the command maprcli stream create.
Ensure that the ID of the user that runs MirrorMaker has the produceperm and topicperm permissions on the stream.

Command Syntax and Descriptions of Parameters

bin/kafka-mirror-maker.sh 
--consumer.config <File that lists consumer properties and values> 
--num.streams <Number of consumer threads> 
--producer.config <File that lists producer properties and values> 
--whitelist=<Java-style regular expression for specifying the topics to mirror>

Parameter	Description
`consumer.config`	The path and name of the file that lists the consumer properties. See the Consumer Properties and Descriptions section for detailed information.
`num.streams`	Use the --num.streams option to specify the number of mirror consumer threads to create. Note that if you start multiple mirror maker processes then you may want to look at the distribution of partitions on the source cluster. If the number of consumption streams is too high per mirror maker process, then some of the mirroring threads will be idle by virtue of the consumer rebalancing algorithm (if they do not end up owning any partitions for consumption).
`producer.config`	The path and name of the file that lists the producer properties. See the Producer Properties and Descriptions section for detailed information.
`whitelist`	A Java-style regular expression for specifying the topics to copy. Commas (',') are interpreted as the regex-choice symbol ('\|'). This parameter is required.

Consumer Properties and Descriptions

group.id=<ID>
bootstrap.servers=<IP address>:<port>
shallow.iterator.enable=false

Property	Description
`group.id`	A unique string that identifies the consumer group this consumer belongs to. This property is required if the consumer uses either the group management functionality by using `subscribe(topic)` or the Kafka-based offset management strategy. If `group.id` is not set and the value of the `num.streams` option is > 1 , messages might go multiple times to a stream.
`bootstrap.servers`	A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form `host1:port1,host2:port2,....` Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down).
`shallow.iterator.enable`	Set this value to `false`.

Producer Properties and Descriptions

key.serializer=<serializer class>
value.serializer=<serializer class>
streams.producer.default.stream=<Path and name of the stream to copy the topics to>
auto.create.topics.enable=true

Property	Description
`key.serializer`	The name of the appropriate serialization class in the `org.apache.kafka.common.serialization` package or a class that implements the `Serializer` interface for serializing keys.
`value.serializer`	The class that implements the `Serializer` interface for serializing values.
`streams.producer.default.stream`	Specifies the path and name of stream that the topics will be copied to.
`auto.create.topics.enable`	Enables auto-creation of topics within the stream specifed with the `streams.producer.default.stream` parameter.

HPE Ezmeral Data Fabric – Customer-Managed 7.9.0 Documentation
Abstract	This site contains documentation for the customer-managed platform of the HPE Ezmeral Data Fabric version 7.9.0 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
Published	April 2025
Edition	7.9.0
Topic last updated	2021-10-07