Optimize BsonArray Index encoding #1673

vbabanin · 2025-04-03T00:11:04Z

Instead of caching precomputed String values, we now cache their corresponding byte[] representations. This avoids repeated ASCII string decoding.

The single contiguous buffer approach provides better cache locality in loops, reduced memory overhead and better performance.

The array index caching implementation consumes approximately 12 KB of memory in total:

ARRAY_INDEXES_BUFFER: ~3,890 bytes
Numbers 0-9: 10 × (1 byte + null terminator) = 20 bytes
Numbers 10-99: 90 × (2 byte + null terminator) = 270 bytes
Numbers 100-999: 900 × (3 bytes + null terminator) = 3,600 bytes

ARRAY_INDEXES_OFFSETS: 4,000 bytes
1,000 integers × 4 bytes per int = 4,000 bytes

ARRAY_INDEXES_LENGTHS: 4,000 bytes
1,000 integers × 4 bytes per int = 4,000 bytes

Array object headers: ~48 bytes

BsonArrayCodecBenchmark results:

Metric	Before	After	Change
ops/s	13583.483	20046.810	+47.5%

JAVA-5836

vbabanin · 2025-04-03T01:21:23Z

As @ShaneHarvey noted in the previous PR (#1664 (comment)), Python caches 1000 consecutive array indexes. To align with that behavior across drivers, I’ve adjusted the cache size to 1000 in the Java driver.

vbabanin · 2025-04-03T01:27:42Z

bson/src/main/org/bson/BsonBinaryWriter.java

-    private static final int ARRAY_INDEXES_CACHE_SIZE = 256;
-    private static final String[] ARRAY_INDEXES_CACHE = new String[ARRAY_INDEXES_CACHE_SIZE];
+    private static final int ARRAY_INDEXES_CACHE_SIZE = 1000;
+    private static final byte[] ARRAY_INDEXES_BUFFER;


In prior benchmarks, the reference-chasing layout (i.e., byte[][]) showed ~25% lower throughput compared to a flat byte[] layout, primarily due to fragmented sequential locality and indirect memory access.

rozza

LGTM!

rozza · 2025-04-03T11:35:44Z

bson/src/test/unit/org/bson/BsonBinaryWriterTest.java

@@ -66,7 +66,7 @@ public void shouldThrowWhenMaxDocumentSizeIsExceeded() {
            writer.writeEndDocument();
            fail();
        } catch (BsonMaximumSizeExceededException e) {
-            assertEquals("Document size of 1037 is larger than maximum of 1024.", e.getMessage());
+            assertEquals("Document size of 12917 is larger than maximum of 12904.", e.getMessage());


~~Did this get missed last time?~~ Yes it did - thanks for fixing.

vbabanin added 3 commits April 2, 2025 17:09

Cache String encoded bytes to reduce GC allocation rate and CPU time.

a167fbc

JAVA-5836

Update tests.

069470c

JAVA-5836

Change calculation logic.

5980b35

JAVA-5836

vbabanin marked this pull request as ready for review April 3, 2025 01:12

vbabanin requested a review from rozza April 3, 2025 01:12

vbabanin self-assigned this Apr 3, 2025

vbabanin changed the title ~~Optimize Array Index encoding~~ Optimize BsonArray Index encoding Apr 3, 2025

vbabanin commented Apr 3, 2025

View reviewed changes

rozza approved these changes Apr 3, 2025

View reviewed changes

vbabanin merged commit c15e14a into mongodb:main Apr 3, 2025
54 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize BsonArray Index encoding #1673

Optimize BsonArray Index encoding #1673

vbabanin commented Apr 3, 2025 •

edited

Loading

vbabanin commented Apr 3, 2025 •

edited

Loading

vbabanin Apr 3, 2025 •

edited

Loading

rozza left a comment

rozza Apr 3, 2025 •

edited

Loading

Optimize BsonArray Index encoding #1673

Optimize BsonArray Index encoding #1673

Conversation

vbabanin commented Apr 3, 2025 • edited Loading

vbabanin commented Apr 3, 2025 • edited Loading

vbabanin Apr 3, 2025 • edited Loading

Choose a reason for hiding this comment

rozza left a comment

Choose a reason for hiding this comment

rozza Apr 3, 2025 • edited Loading

Choose a reason for hiding this comment

vbabanin commented Apr 3, 2025 •

edited

Loading

vbabanin commented Apr 3, 2025 •

edited

Loading

vbabanin Apr 3, 2025 •

edited

Loading

rozza Apr 3, 2025 •

edited

Loading