Modes of Publishing
Describes different modes of publishing.
When publishing a message, a producer sends a record to the producer client library. The producer client library batches messages into multiple publish requests which are sent to the HPE Ezmeral Data Fabric Streams server.
At Least Once
The default message delivery semantics is "at-least-one". At-least-once means that the message delivery guarantees that a message is published at least once to the HPE Ezmeral Data Fabric Streams server. Messages are never lost but may be re-delivered.
Exactly Once
An "exactly once" message delivery semantics produces messages without duplication. Each
message is delivered once and only once. Exactly once is insured by uniquely identifying a
group of messages that are atomically persisted. Exactly once message delivery is set with
the producer idempotence
option.
enable.idempotence
to true. By supporting an idempotent
producer, retries no longer introduce duplicates. See Enabling an Idempotent Producer for more information.- Producer ID - A unique identifier is generated internally for each client and group of
messages that are atomically persisted.
As a minimum, the ID is a unique ID for a given stream-topic-partition. Producer IDs expire if a producer ID is inactive for a period of time. The default Producer ID expiration is 7 days. At that point, a new Producer ID is requested once the Producer ID is expired. To change the expiration date, see the
pidexpirysecs
parameter inmaprcli
stream create and stream edit for more information. - Sequence Number - A number that is monotonically incremented on every produced group of messages for the given Producer ID, assigned when received, and generated internally.
idempotence
option, is not set to true, then "at
least once" message delivery semantics applies.If the client resends a message after the producer ID has expired, then
UnknownProducerIdException
is thrown.
- If message1 from clientA is sent to a stream-topic-partition0 and 7 days go by, the Producer ID expires.
- Then, if clientA sends another message that has the same data to the same
stream-topic-partition (stream-topic-partition0), then
UnknownProducerIdException
is thrown because the Producer ID has expired..
Server Acknowledgements
By default, publishing requests for messages are sent without waiting for acknowledgement (ack) from the HPE Ezmeral Data Fabric Streams server.
The acknowledgement behavior is determined by the producer configuration parameter
streams.parallel.flushers.per.partition
, which defaults to true.
With an "at-least-once" message delivery, in some failure scenarios, a message can be produced more than once for a single send call. A common reason for message duplication is when a network error occurs, a client may retry sending a message to a server node. If the network error occurs after the message is processed and persisted by the server, it can lead to duplicate messages in the system.
- Publishing without Ack
- When publishing without ack (default), it is possible for messages to be published to
the partitions out of order due to the presence of multiple network interface
controllers, network errors, or retries.
For example, suppose a producer is sending messages that are specifically for Partition 1. The producer client library buffers the messages and sends a batch to Partition 1. Meanwhile, the producer keeps sending messages for Partition 1 and the client continues to buffer them. The next time the producer client library has enough messages for Partition 1, the client sends another batch, irresepctive of whether or notHPE Ezmeral Data Fabric Streams server has acknowledged the previous batch.
- Publishing with Ack
- If you always want messages to arrive to partitions in the order in which they were
sent, set the configuration parameter
streams.parallel.flushers.per.partition
to false. This causes the producer client library to wait for ack (acknowledgements) from the HPE Ezmeral Data Fabric Streams server before sending subsequent publish requests.