-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Optimize BsonArray Index encoding #1673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
JAVA-5836
JAVA-5836
As @ShaneHarvey noted in the previous PR (#1664 (comment)), Python caches 1000 consecutive array indexes. To align with that behavior across drivers, I’ve adjusted the cache size to 1000 in the Java driver. |
private static final int ARRAY_INDEXES_CACHE_SIZE = 256; | ||
private static final String[] ARRAY_INDEXES_CACHE = new String[ARRAY_INDEXES_CACHE_SIZE]; | ||
private static final int ARRAY_INDEXES_CACHE_SIZE = 1000; | ||
private static final byte[] ARRAY_INDEXES_BUFFER; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In prior benchmarks, the reference-chasing layout (i.e., byte[][]
) showed ~25% lower throughput compared to a flat byte[]
layout, primarily due to fragmented sequential locality and indirect memory access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -66,7 +66,7 @@ public void shouldThrowWhenMaxDocumentSizeIsExceeded() { | |||
writer.writeEndDocument(); | |||
fail(); | |||
} catch (BsonMaximumSizeExceededException e) { | |||
assertEquals("Document size of 1037 is larger than maximum of 1024.", e.getMessage()); | |||
assertEquals("Document size of 12917 is larger than maximum of 12904.", e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this get missed last time? Yes it did - thanks for fixing.
Instead of caching precomputed
String
values, we now cache their corresponding byte[] representations. This avoids repeated ASCII string decoding.The single contiguous buffer approach provides better cache locality in loops, reduced memory overhead and better performance.
The array index caching implementation consumes approximately
12 KB of memory in total
:Array object headers: ~48 bytes
BsonArrayCodecBenchmark results:
JAVA-5836