You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 13, 2023. It is now read-only.
FrozenTrie already implemented a read-only in-memory structure based on
vectors of simple structs. Implementing a mmapped version of that is not
too hard: serialize these arrays in such a way they can be accessed
directly after mapping in memory.
This version makes no effort to ensure compatibility between
architectures: mapped data must be written and read on machines sharing
the same architecture. Data is not really serialized, only
reinterpret_cast'ed and written as such. In theory, there might be
arguments about why it may fail. In practice it works just fine.
That said, to avoid playing data alignment games, Node fields are
denormalized in as many packed arrays of similar size. The upside is it
helps unifying parsing/loading them in memory, the drawback is scanning
nodes touch more memory pages. Performance seems fine.
The change is roughly split into:
- Implement serialization code in FrozenTrie. The basic structure is a
primitive type array prefixed by the number of elements.
- Implement MappedTrie, which mmaps the input file and wraps the arrays
in MappedArray objects to minimize pointer arithmetic.
- Introduce AbstractTrie to expose basic accessors on nodes, indices and
payloads and rewrite find_anchored() using it. Share find_anchored()
implementation between FrozenTrie and MappedTrie.
0 commit comments