Skip to content

Commit 8ecdd91

Browse files
committed
Document compression
Fixes #90
1 parent 4038c0c commit 8ecdd91

File tree

4 files changed

+136
-6
lines changed

4 files changed

+136
-6
lines changed

src/docs/asciidoc/api.adoc

+86-3
Original file line numberDiff line numberDiff line change
@@ -362,12 +362,20 @@ The following table sums up the main settings to create a `Producer`:
362362
|100
363363

364364
|`subEntrySize`
365-
|[[producer-sub-entry-size-configuration-entry]]The number of messages to put in a sub-entry. A sub-entry is one "slot" in a publishing
365+
|[[producer-sub-entry-size-configuration-entry]]The number of messages to put in a sub-entry.
366+
A sub-entry is one "slot" in a publishing
366367
frame, meaning outbound messages are not only batched in publishing frames, but in sub-entries
367-
as well. Use this feature to increase throughput at the cost of increased latency and
368+
as well.
369+
Use this feature to increase throughput at the cost of increased latency and
368370
potential duplicated messages even when deduplication is enabled.
371+
See the <<sub-entry-batching-and-compression, dedicated section>> for more information.
369372
|1 (meaning no use of sub-entry batching)
370373

374+
|`compression`
375+
|Compression algorithm to use when sub-entry batching is in use.
376+
See the <<sub-entry-batching-and-compression, dedicated section>> for more information.
377+
|Compression.NONE
378+
371379
|`maxUnconfirmedMessages`
372380
|The maximum number of unconfirmed outbound messages. `Producer#send` will start
373381
blocking when the limit is reached.
@@ -501,7 +509,7 @@ on 2 client-side elements: the _producer name_ and the _message publishing ID_.
501509
.Deduplication is not guaranteed when using sub-entries batching
502510
====
503511
It is not possible to guarantee deduplication when
504-
<<producer-sub-entry-size-configuration-entry, sub-entry batching>> is in use.
512+
<<sub-entry-batching-and-compression, sub-entry batching>> is in use.
505513
Sub-entry batching is disabled by default and it does not prevent from
506514
batching messages in a single publish frame, which can already provide
507515
very high throughput.
@@ -623,6 +631,81 @@ include::{test-examples}/ProducerUsage.java[tag=producer-queries-last-publishing
623631
<4> Scroll to the content for the next publishing ID
624632
<5> Set the message publishing
625633

634+
[[sub-entry-batching-and-compression]]
635+
===== Sub-Entry Batching and Compression
636+
637+
RabbitMQ Stream provides a special mode to publish, store, and dispatch messages: sub-entry batching.
638+
This mode increases throughput at the cost of increased latency and potential duplicated messages even when deduplication is enabled.
639+
It also allows using compression to reduce bandwidth and storage if messages are reasonably similar, at the cost of increasing CPU usage on the client side.
640+
641+
Sub-entry batching consists in squeezing several messages – a batch – in the slot that is usually used for one message.
642+
This means outbound messages are not only batched in publishing frames, but in sub-entries as well.
643+
644+
You can enable sub-entry batching by setting the `ProducerBuilder#subEntrySize` parameter to a value greater than 1, like in the following snippet:
645+
646+
.Enabling sub-entry batching
647+
[source,java,indent=0]
648+
--------
649+
include::{test-examples}/ProducerUsage.java[tag=producer-sub-entry-batching]
650+
--------
651+
<1> Set batch size to 100 (the default)
652+
<2> Set sub-entry size to 10
653+
654+
Reasonable values for the sub-entry size usually go from 10 to a few dozens.
655+
656+
A sub-entry batch will go directly to disc after it reached the broker, so the publishing client has complete control over it.
657+
This is the occasion to take advantage of the similarity of messages and compress them.
658+
There is no compression by default but you can choose among several algorithms with the `ProducerBuilder#compression(Compression)` method:
659+
660+
.Enabling compression of sub-entry messages
661+
[source,java,indent=0]
662+
--------
663+
include::{test-examples}/ProducerUsage.java[tag=producer-sub-entry-batching-and-compression]
664+
--------
665+
<1> Set batch size to 100 (the default)
666+
<2> Set sub-entry size to 10
667+
<3> Use the Zstandard compression algorithm
668+
669+
Note the messages in a sub-entry are compressed altogether to benefit from their potential similarity, not one by one.
670+
671+
The following table lists the supported algorithms, general information about them, and the respective implementations used by default.
672+
673+
[%header,cols=3*]
674+
|===
675+
|Algorithm
676+
|Overview
677+
|Implementation used
678+
679+
|https://en.wikipedia.org/wiki/Gzip[gzip]
680+
|Has a high compression ratio but is slow compared to other algorithms.
681+
|JDK implementation
682+
683+
|https://en.wikipedia.org/wiki/Snappy_(compression)[Snappy]
684+
|Aims for reasonable compression ratio and very high speeds.
685+
|https://github.com/xerial/snappy-java[Xerial Snappy] (framed)
686+
687+
|https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)[LZ4]
688+
|Aims for good trade-off between speed and compression ratio.
689+
|https://github.com/lz4/lz4-java[LZ4 Java] (framed)
690+
691+
|https://en.wikipedia.org/wiki/Zstd[zstd] (Zstandard)
692+
|Aims for high compression ratio and high speed, especially for decompression.
693+
|https://github.com/luben/zstd-jni[zstd-jni]
694+
|===
695+
696+
You are encouraged to test and evaluate the compression algorithms depending on your needs.
697+
698+
The compression libraries are pluggable thanks to the `EnvironmentBuilder#compressionCodecFactory(CompressionCodecFactory)` method.
699+
700+
701+
[NOTE]
702+
.Consumers, sub-entry batching, and compression
703+
====
704+
There is no configuration required for consumers with regard to sub-entry batching and compression.
705+
The broker dispatches messages to client libraries: they are supposed to figure out the format of messages, extract them from their sub-entry, and decompress them if necessary.
706+
So when you set up sub-entry batching and compression in your publishers, the consuming applications must use client libraries that support this mode, which is the case for the stream Java client.
707+
====
708+
626709
==== Consumer
627710

628711
`Consumer` is the API to consume messages from a stream.

src/main/java/com/rabbitmq/stream/EnvironmentBuilder.java

+10-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright (c) 2020-2021 VMware, Inc. or its affiliates. All rights reserved.
1+
// Copyright (c) 2020-2022 VMware, Inc. or its affiliates. All rights reserved.
22
//
33
// This software, the RabbitMQ Stream Java client library, is dual-licensed under the
44
// Mozilla Public License 2.0 ("MPL"), and the Apache License version 2 ("ASL").
@@ -13,6 +13,7 @@
1313
1414
package com.rabbitmq.stream;
1515

16+
import com.rabbitmq.stream.compression.Compression;
1617
import com.rabbitmq.stream.compression.CompressionCodecFactory;
1718
import com.rabbitmq.stream.metrics.MetricsCollector;
1819
import com.rabbitmq.stream.sasl.CredentialsProvider;
@@ -101,6 +102,14 @@ public interface EnvironmentBuilder {
101102

102103
EnvironmentBuilder byteBufAllocator(ByteBufAllocator byteBufAllocator);
103104

105+
/**
106+
* Compression codec factory to use for compression in sub-entry batching.
107+
*
108+
* @param compressionCodecFactory
109+
* @return this builder instance
110+
* @see ProducerBuilder#subEntrySize(int)
111+
* @see ProducerBuilder#compression(Compression)
112+
*/
104113
EnvironmentBuilder compressionCodecFactory(CompressionCodecFactory compressionCodecFactory);
105114

106115
/**

src/main/java/com/rabbitmq/stream/ProducerBuilder.java

+15-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright (c) 2020-2021 VMware, Inc. or its affiliates. All rights reserved.
1+
// Copyright (c) 2020-2022 VMware, Inc. or its affiliates. All rights reserved.
22
//
33
// This software, the RabbitMQ Stream Java client library, is dual-licensed under the
44
// Mozilla Public License 2.0 ("MPL"), and the Apache License version 2 ("ASL").
@@ -49,6 +49,20 @@ public interface ProducerBuilder {
4949
*/
5050
ProducerBuilder subEntrySize(int subEntrySize);
5151

52+
/**
53+
* Compression algorithm to use to compress a batch of sub-entries.
54+
*
55+
* <p>Compression can take advantage of similarity in messages to significantly reduce the size of
56+
* the sub-entry batch. This translates to less bandwidth and storage used, at the cost of more
57+
* CPU usage to compress and decompress on the client side. Note the server is not involved in the
58+
* compression/decompression process.
59+
*
60+
* <p>Default is no compression.
61+
*
62+
* @param compression
63+
* @return this builder instance
64+
* @see Compression
65+
*/
5266
ProducerBuilder compression(Compression compression);
5367

5468
/**

src/test/java/com/rabbitmq/stream/docs/ProducerUsage.java

+25-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright (c) 2020-2021 VMware, Inc. or its affiliates. All rights reserved.
1+
// Copyright (c) 2020-2022 VMware, Inc. or its affiliates. All rights reserved.
22
//
33
// This software, the RabbitMQ Stream Java client library, is dual-licensed under the
44
// Mozilla Public License 2.0 ("MPL"), and the Apache License version 2 ("ASL").
@@ -17,6 +17,7 @@
1717
import com.rabbitmq.stream.Environment;
1818
import com.rabbitmq.stream.Message;
1919
import com.rabbitmq.stream.Producer;
20+
import com.rabbitmq.stream.compression.Compression;
2021
import java.nio.charset.StandardCharsets;
2122
import java.time.Duration;
2223
import java.util.UUID;
@@ -110,6 +111,29 @@ void producerWithNameQueryLastPublishingId() {
110111
// end::producer-queries-last-publishing-id[]
111112
}
112113

114+
void producerSubEntryBatching() {
115+
Environment environment = Environment.builder().build();
116+
// tag::producer-sub-entry-batching[]
117+
Producer producer = environment.producerBuilder()
118+
.stream("my-stream")
119+
.batchSize(100) // <1>
120+
.subEntrySize(10) // <2>
121+
.build();
122+
// end::producer-sub-entry-batching[]
123+
}
124+
125+
void producerSubEntryBatchingCompression() {
126+
Environment environment = Environment.builder().build();
127+
// tag::producer-sub-entry-batching-and-compression[]
128+
Producer producer = environment.producerBuilder()
129+
.stream("my-stream")
130+
.batchSize(100) // <1>
131+
.subEntrySize(10) // <2>
132+
.compression(Compression.ZSTD) // <3>
133+
.build();
134+
// end::producer-sub-entry-batching-and-compression[]
135+
}
136+
113137
boolean moreContent(long publishingId) {
114138
return true;
115139
}

0 commit comments

Comments
 (0)