Skip to content

Commit 3e22217

Browse files
committed
Sharing map documentation
1 parent e54f740 commit 3e22217

File tree

1 file changed

+235
-0
lines changed

1 file changed

+235
-0
lines changed

src/util/sharing_map.h

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,72 @@ Author: Daniel Poetzl
3939
CV typename sharing_mapt<keyT, valueT, hashT, predT>::ST \
4040
sharing_mapt<keyT, valueT, hashT, predT>
4141

42+
/// A map implemented as a tree where subtrees can be shared between different
43+
/// maps.
44+
///
45+
/// The map is implemented as a fixed-height n-ary trie. The height H and the
46+
/// maximum number of children per inner node S are determined by the two
47+
/// configuration parameters `bits` and `chunks` in sharing_map.h. It holds
48+
/// that H = `bits` / `chunks` and S = 2 ^ `chunks`.
49+
///
50+
/// When inserting a key-value pair into the map, first the hash of its key is
51+
/// computed. The `bits` number of lower order bits of the hash are deemed
52+
/// significant, and are grouped into `bits` / `chunk` chunks. The hash is then
53+
/// treated as a string (with each chunk representing a character) for the
54+
/// purposes of determining the position of the key-value pair in the trie. The
55+
/// actual key-value pairs are stored in the leaf nodes. Collisions (i.e., two
56+
/// different keys yield the same "string"), are handled by chaining the
57+
/// corresponding key-value pairs in a `std::list`.
58+
///
59+
/// The use of a trie in combination with hashing has the advantage that the
60+
/// tree is unlikely to degenerate (if the number of hash collisions is low).
61+
/// This makes re-balancing operations unnecessary which do not interact well
62+
/// with sharing. A disadvantage is that the height of the tree is likely
63+
/// greater than if the elements had been stored in a balanced tree (with
64+
/// greater differences for sparser maps).
65+
///
66+
/// The nodes in the sharing map are objects of type sharing_nodet. Each sharing
67+
/// node has a `shared_ptr` to an object of type `dt` which can be shared
68+
/// between nodes.
69+
///
70+
/// Sharing is initially generated when a map is assigned to another map or
71+
/// copied via the copy constructor. Then both maps contain a pointer to the
72+
/// root node of the tree that represents the maps. On subsequent modifications
73+
/// to one of the maps, nodes are copied and sharing is lessened as described in
74+
/// the following.
75+
///
76+
/// Retrieval, insertion, and removal operations interact with sharing as
77+
/// follows:
78+
/// - When a non-const reference to a value in the map that is contained in a
79+
/// shared subtree is retrieved, the nodes on the path from the root of the
80+
/// subtree to the corresponding key-value pair (and the key-value pair itself)
81+
/// are copied and integrated with the map.
82+
/// - When a key-value pair is inserted into the map and its position is in a
83+
/// shared subtree, already existing nodes from the root of the subtree to the
84+
/// position of the key-value pair are copied and integrated with the map, and
85+
/// new nodes are created as needed.
86+
/// - When a key-value pair is erased from the map that is in a shared subtree,
87+
/// nodes from the root of the subtree to the last node that will still exist on
88+
/// the path to the erased element after the element has been removed are
89+
/// copied and integrated with the map, and the remaining nodes are removed.
90+
///
91+
/// Several methods take a hint indicating whether the element is known not to
92+
/// be in the map (`false`), known to be in the map (`true`), or it is unknown
93+
/// whether the element is in the map (`unknown`). The value `unknown` is always
94+
/// valid. When `true` or `false` are given they need to be accurate, otherwise
95+
/// the behavior is undefined. A correct hint can prevent the need to follow a
96+
/// path from the root to a key-value pair twice (e.g., once for checking that
97+
/// it exists, and second for copying nodes).
98+
///
99+
/// In the descriptions of the methods of the sharing map we also give the
100+
/// complexity of the operations. We use the following symbols:
101+
/// - N: number of key-value pairs in the map
102+
/// - M: maximum number of key-value pairs that are chained in a leaf node
103+
/// - H: height of the tree
104+
/// - S: maximum number of children per internal node
105+
///
106+
/// The first two symbols denote dynamic properties of a given map, whereas the
107+
/// last two symbols are static configuration parameters of the map class.
42108
template <
43109
class keyT,
44110
class valueT,
@@ -68,7 +134,16 @@ class sharing_mapt
68134

69135
typedef size_t size_type;
70136

137+
/// Return type of methods that retrieve a const reference to a value. First
138+
/// component is a reference to the value (or a dummy value if the given key
139+
/// does not exist), and the second component indicates if the value with the
140+
/// given key was found.
71141
typedef const std::pair<const mapped_type &, const bool> const_find_type;
142+
143+
/// Return type of methods that retrieve a reference to a value. First
144+
/// component is a reference to the value (or a dummy value if the given key
145+
/// does not exist), and the second component indicates if the value with the
146+
/// given key was found.
72147
typedef const std::pair<mapped_type &, const bool> find_type;
73148

74149
typedef std::vector<key_type> keyst;
@@ -89,7 +164,10 @@ class sharing_mapt
89164

90165
static const std::string not_found_msg;
91166

167+
/// Number of bits in the hash deemed significant
92168
static const size_t bits;
169+
170+
/// Size of a chunk of the hash that represents a character
93171
static const size_t chunk;
94172

95173
static const size_t mask;
@@ -136,6 +214,9 @@ class sharing_mapt
136214

137215
mapped_type &operator[](const key_type &k);
138216

217+
/// Swap with other map
218+
///
219+
/// Complexity: O(1)
139220
void swap(self_type &other)
140221
{
141222
map.swap(other.map);
@@ -145,22 +226,32 @@ class sharing_mapt
145226
other.num=tmp;
146227
}
147228

229+
/// Get number of elements in map
230+
///
231+
/// Complexity: O(1)
148232
size_type size() const
149233
{
150234
return num;
151235
}
152236

237+
/// Check if map is empty
153238
bool empty() const
154239
{
155240
return num==0;
156241
}
157242

243+
/// Clear map
158244
void clear()
159245
{
160246
map.clear();
161247
num=0;
162248
}
163249

250+
/// Check if key is in map
251+
///
252+
/// Complexity:
253+
/// - Worst case: O(H * log(S) + M)
254+
/// - Best case: O(H)
164255
bool has_key(const key_type &k) const
165256
{
166257
return get_leaf_node(k)!=nullptr;
@@ -169,6 +260,9 @@ class sharing_mapt
169260
// views
170261

171262
typedef std::pair<const key_type &, const mapped_type &> view_itemt;
263+
264+
/// View of the key-value pairs in the map. A view is a list of pairs with
265+
/// the components being const references to the keys and values in the map.
172266
typedef std::vector<view_itemt> viewt;
173267

174268
class delta_view_itemt
@@ -194,6 +288,9 @@ class sharing_mapt
194288
const mapped_type &other_m;
195289
};
196290

291+
/// Delta view of the key-value pairs in two maps. A delta view of two maps is
292+
/// a view of the key-value pairs in the maps that are contained in subtrees
293+
/// that are not shared between them (also see get_delta_view()).
197294
typedef std::vector<delta_view_itemt> delta_viewt;
198295

199296
void get_view(viewt &view) const;
@@ -214,6 +311,15 @@ class sharing_mapt
214311
void gather_all(const node_type &n, delta_viewt &delta_view) const;
215312
};
216313

314+
/// Get a view of the elements in the map
315+
/// A view is a list of pairs with the components being const references to the
316+
/// keys and values in the map.
317+
///
318+
/// Complexity:
319+
/// - Worst case: O(N * H * log(S))
320+
/// - Best case: O(N + H)
321+
///
322+
/// \param[out] view: Empty view
217323
SHARING_MAPT(void)::get_view(viewt &view) const
218324
{
219325
assert(view.empty());
@@ -286,6 +392,39 @@ SHARING_MAPT(void)::gather_all(const node_type &n, delta_viewt &delta_view)
286392
while(!stack.empty());
287393
}
288394

395+
/// Get a delta view of the elements in the map
396+
///
397+
/// Informally, a delta view of two maps is a view of the key-value pairs in the
398+
/// maps that are contained in subtrees that are not shared between them.
399+
///
400+
/// A delta view is represented as a list of structs, with each struct having
401+
/// four members (`in_both`, `key`, `value1`, `value2`). The elements `key`,
402+
/// `value1`, and `value2` are const references to the corresponding elements in
403+
/// the map. The first element indicates whether the key exists in both maps,
404+
/// the second element is the key, the third element is the mapped value of the
405+
/// first map, and the fourth element is the mapped value of the second map, or
406+
/// a dummy element if the key exists only in the first map (in which case
407+
/// `in_both` is false).
408+
///
409+
/// Calling `A.delta_view(B, ...)` yields a view such that for each element in
410+
/// the view one of two things holds:
411+
/// - the key is contained in both A and B, and in the maps the corresponding
412+
/// key-value pairs are not contained in a subtree that is shared between them
413+
/// - the key is only contained in A
414+
///
415+
/// When `only_common=true`, the first case above holds for every element in the
416+
/// view.
417+
///
418+
/// Complexity:
419+
/// - Worst case: O(max(N1, N2) * H * log(S) * M1 * M2) (no sharing)
420+
/// - Best case: O(1) (maximum sharing)
421+
///
422+
/// The symbols N1, M1 refer to map A, and symbols N2, M2 refer to map B.
423+
///
424+
/// \param other: other map
425+
/// \param[out] delta_view: Empty delta view
426+
/// \param only_common: Indicates if the returned delta view should only
427+
/// contain key-value pairs for keys that exist in both maps
289428
SHARING_MAPT(void)::get_delta_view(
290429
const self_type &other,
291430
delta_viewt &delta_view,
@@ -439,6 +578,15 @@ SHARING_MAPT2(const, node_type *)::get_leaf_node(const key_type &k) const
439578
return p;
440579
}
441580

581+
/// Erase element
582+
///
583+
/// Complexity:
584+
/// - Worst case: O(H * S + M)
585+
/// - Best case: O(H)
586+
///
587+
/// \param k: The key of the element to erase
588+
/// \param key_exists: Hint to indicate whether the element is known to exist
589+
/// (possible values `unknown` or` true`)
442590
SHARING_MAPT2(, size_type)::erase(
443591
const key_type &k,
444592
const tvt &key_exists)
@@ -488,6 +636,17 @@ SHARING_MAPT2(, size_type)::erase(
488636
return 1;
489637
}
490638

639+
/// Erase all elements
640+
///
641+
/// Complexity:
642+
/// - Worst case: O(K * (H * S + M))
643+
/// - Best case: O(K * H)
644+
///
645+
/// \param ks: The keys of the element to erase
646+
/// \param key_exists: Hint to indicate whether the elements are known to exist
647+
/// (possible values `unknown` or `true`). Applies to all elements (i.e., have
648+
/// to use `unknown` if for at least one element it is not known whether it
649+
/// exists)
491650
SHARING_MAPT2(, size_type)::erase_all(
492651
const keyst &ks,
493652
const tvt &key_exists)
@@ -502,6 +661,18 @@ SHARING_MAPT2(, size_type)::erase_all(
502661
return cnt;
503662
}
504663

664+
/// Insert element, return const reference
665+
///
666+
/// Complexity:
667+
/// - Worst case: O(H * S + M)
668+
/// - Best case: O(H)
669+
///
670+
/// \param k: The key of the element to insert
671+
/// \param m: The mapped value to insert
672+
/// \param key_exists: Hint to indicate whether the element is known to exist
673+
/// (possible values `false` or `unknown`)
674+
/// \return Pair of const reference to existing or newly inserted element, and
675+
/// boolean indicating if new element was inserted
505676
SHARING_MAPT2(, const_find_type)::insert(
506677
const key_type &k,
507678
const mapped_type &m,
@@ -525,13 +696,26 @@ SHARING_MAPT2(, const_find_type)::insert(
525696
return const_find_type(as_const(p)->get_value(), true);
526697
}
527698

699+
// Insert element, return const reference
528700
SHARING_MAPT2(, const_find_type)::insert(
529701
const value_type &p,
530702
const tvt &key_exists)
531703
{
532704
return insert(p.first, p.second, key_exists);
533705
}
534706

707+
/// Insert element, return non-const reference
708+
///
709+
/// Complexity:
710+
/// - Worst case: O(H * S + M)
711+
/// - Best case: O(H)
712+
///
713+
/// \param k: The key of the element to insert
714+
/// \param m: The mapped value to insert
715+
/// \param key_exists: Hint to indicate whether the element is known to exist
716+
/// (possible values false or unknown)
717+
/// \return Pair of reference to existing or newly inserted element, and boolean
718+
/// indicating if new element was inserted
535719
SHARING_MAPT2(, find_type)::place(
536720
const key_type &k,
537721
const mapped_type &m)
@@ -550,12 +734,24 @@ SHARING_MAPT2(, find_type)::place(
550734
return find_type(p->get_value(), true);
551735
}
552736

737+
/// Insert element, return non-const reference
553738
SHARING_MAPT2(, find_type)::place(
554739
const value_type &p)
555740
{
556741
return place(p.first, p.second);
557742
}
558743

744+
/// Find element
745+
///
746+
/// Complexity:
747+
/// - Worst case: O(H * S + M)
748+
/// - Best case: O(H)
749+
///
750+
/// \param k: The key of the element to search for
751+
/// \param key_exists: Hint to indicate whether the element is known to exist
752+
/// (possible values `unknown` or `true`)
753+
/// \return Pair of reference to found value (or dummy value if not found), and
754+
/// boolean indicating if element was found.
559755
SHARING_MAPT2(, find_type)::find(
560756
const key_type &k,
561757
const tvt &key_exists)
@@ -575,6 +771,17 @@ SHARING_MAPT2(, find_type)::find(
575771

576772
}
577773

774+
/// Find element
775+
///
776+
/// Complexity:
777+
/// - Worst case: O(H * log(S) + M)
778+
/// - Best case: O(H)
779+
///
780+
/// \param k: The key of the element to search
781+
/// \param key_exists: Hint to indicate whether the element is known to exist
782+
/// (possible values `unknown` or `true`)
783+
/// \return Pair of const reference to found value (or dummy value if not
784+
/// found), and boolean indicating if element was found.
578785
SHARING_MAPT2(, const_find_type)::find(const key_type &k) const
579786
{
580787
const node_type *p=get_leaf_node(k);
@@ -585,6 +792,17 @@ SHARING_MAPT2(, const_find_type)::find(const key_type &k) const
585792
return const_find_type(p->get_value(), true);
586793
}
587794

795+
/// Get element at key
796+
///
797+
/// Complexity:
798+
/// - Worst case: O(H * S + M)
799+
/// - Best case: O(H)
800+
///
801+
/// \param k: The key of the element
802+
/// \param key_exists: Hint to indicate whether the element is known to exist
803+
/// (possible values `unknown` or `true`)
804+
/// \throw `std::out_of_range` if key not found
805+
/// \return The mapped value
588806
SHARING_MAPT2(, mapped_type &)::at(
589807
const key_type &k,
590808
const tvt &key_exists)
@@ -597,6 +815,15 @@ SHARING_MAPT2(, mapped_type &)::at(
597815
return r.first;
598816
}
599817

818+
/// Get element at key
819+
///
820+
/// Complexity:
821+
/// - Worst case: O(H * log(S) + M)
822+
/// - Best case: O(H)
823+
///
824+
/// \param k: The key of the element
825+
/// \throw std::out_of_range if key not found
826+
/// \return The mapped value
600827
SHARING_MAPT2(const, mapped_type &)::at(const key_type &k) const
601828
{
602829
const_find_type r=find(k);
@@ -606,6 +833,14 @@ SHARING_MAPT2(const, mapped_type &)::at(const key_type &k) const
606833
return r.first;
607834
}
608835

836+
/// Get element at key, insert new if non-existent
837+
///
838+
/// Complexity:
839+
/// - Worst case: O(H * S + M)
840+
/// - Best case: O(H)
841+
///
842+
/// \param k: The key of the element
843+
/// \return The mapped value
609844
SHARING_MAPT2(, mapped_type &)::operator[](const key_type &k)
610845
{
611846
return place(k, mapped_type()).first;

0 commit comments

Comments
 (0)