Modes of Publishing

Describes different modes of publishing.

When publishing a message, a producer sends a record to the producer client library. The producer client library batches messages into multiple publish requests which are sent to the HPE Ezmeral Data Fabric Streams server.

At Least Once

The default message delivery semantics is "at-least-one". At-least-once means that the message delivery guarantees that a message is published at least once to the HPE Ezmeral Data Fabric Streams server. Messages are never lost but may be re-delivered.

Exactly Once

An "exactly once" message delivery semantics produces messages without duplication. Each message is delivered once and only once. Exactly once is insured by uniquely identifying a group of messages that are atomically persisted. Exactly once message delivery is set with the producer idempotence option.

NOTE
Exactly-once message deliver semantics is enabled by setting the producer configurable option, enable.idempotence to true. By supporting an idempotent producer, retries no longer introduce duplicates. See Enabling an Idempotent Producer for more information.
The following unique identifiers are associated with each message:
  • Producer ID - A unique identifier is generated internally for each client and group of messages that are atomically persisted.

    As a minimum, the ID is a unique ID for a given stream-topic-partition. Producer IDs expire if a producer ID is inactive for a period of time. The default Producer ID expiration is 7 days. At that point, a new Producer ID is requested once the Producer ID is expired. To change the expiration date, see the pidexpirysecs parameter in maprcli stream create and stream edit for more information.

  • Sequence Number - A number that is monotonically incremented on every produced group of messages for the given Producer ID, assigned when received, and generated internally.
NOTE
If the producer idempotence option, is not set to true, then "at least once" message delivery semantics applies.

If the client resends a message after the producer ID has expired, then UnknownProducerIdException is thrown.

For example:
  • If message1 from clientA is sent to a stream-topic-partition0 and 7 days go by, the Producer ID expires.
  • Then, if clientA sends another message that has the same data to the same stream-topic-partition (stream-topic-partition0), then UnknownProducerIdException is thrown because the Producer ID has expired..
NOTE
With the alternative "at least once" message delivery, in some failure scenarios, a message can be produced more than once for a single send call. Common reasons for message duplication include network error or server failure. For example, if a network error occurs and the message has been processed and persisted by the server, if the client re-tries sending a message to a server node, then the result could be duplicate messages in the system.

Server Acknowledgements

By default, publishing requests for messages are sent without waiting for acknowledgement (ack) from the HPE Ezmeral Data Fabric Streams server.

The acknowledgement behavior is determined by the producer configuration parameter streams.parallel.flushers.per.partition, which defaults to true.

With an "at-least-once" message delivery, in some failure scenarios, a message can be produced more than once for a single send call. A common reason for message duplication is when a network error occurs, a client may retry sending a message to a server node. If the network error occurs after the message is processed and persisted by the server, it can lead to duplicate messages in the system.

Publishing without Ack
When publishing without ack (default), it is possible for messages to be published to the partitions out of order due to the presence of multiple network interface controllers, network errors, or retries.

For example, suppose a producer is sending messages that are specifically for Partition 1. The producer client library buffers the messages and sends a batch to Partition 1. Meanwhile, the producer keeps sending messages for Partition 1 and the client continues to buffer them. The next time the producer client library has enough messages for Partition 1, the client sends another batch, irresepctive of whether or notHPE Ezmeral Data Fabric Streams server has acknowledged the previous batch.

Publishing with Ack
If you always want messages to arrive to partitions in the order in which they were sent, set the configuration parameter streams.parallel.flushers.per.partition to false. This causes the producer client library to wait for ack (acknowledgements) from the HPE Ezmeral Data Fabric Streams server before sending subsequent publish requests.