Skip to content

Sharing map refactoring to reduce memory consumption #2126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 24, 2018

Conversation

danpoe
Copy link
Contributor

@danpoe danpoe commented Apr 26, 2018

This changes the representation of nodes in sharing_mapt to reduce memory consumption.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a cursory review for now, we'll have a detailed look tomorrow. So far we've found a couple of places that could use more detailed explanation, and especially rationale.

unit/Makefile Outdated
@@ -28,6 +28,7 @@ SRC += unit_tests.cpp \
java_bytecode/inherited_static_fields/inherited_static_fields.cpp \
pointer-analysis/custom_value_set_analysis.cpp \
sharing_node.cpp \
sharing_map.cpp \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit seems to be shared with #1899?

@@ -450,7 +450,7 @@ SHARING_MAPT2(, size_type)::erase(
return 0;

node_type *del=nullptr;
unsigned del_bit;
unsigned del_bit = 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one too? (#1899)

@@ -0,0 +1,560 @@
/*******************************************************************\

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "small map" mean? Can you elaborate on this in the commit message itself?

/// Map from small integers to values
///
/// A data structure that maps small integers (in {0, ..., Num-1}) to values.
/// It is designed to be more memory-efficient than std::map for this use case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually more memory efficient than arrays? The benchmark given in this comment for an empty small_mapt<uint64t, uint32_t, 8> - 32 bytes - is exactly as big as an std::array<uint32_t, 4>, with a full small_mapt being much worse. And arrays are also indexed by integers. Is the relation more favourable towards small_mapt with larger Num? If so, can you mention this here?

if(n == 0)
return nullptr;

T *mem = (T *)malloc(sizeof(T) * n);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why all this code with malloc/realloc instead of just using std::vector (which already does all of these things for you)?

Also, this leaves the result of the allocation uninitialised which is probably ok for POD types, but non-POD types need to be initialised with placement-new.

@@ -12,340 +12,499 @@ Author: Daniel Poetzl
#ifndef CPROVER_UTIL_SHARING_NODE_H
#define CPROVER_UTIL_SHARING_NODE_H

#include <list>
#define DEBUG

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is debug code this shouldn't be enabled by default. Also, "DEBUG" is too generic a name to have it defined and then include a bunch of headers. If you're going to keep this, you should give it a less generic name and immediately undef it after the ifdef check.

//#define _sn_assert(b)

#define _sn_type_par_decl \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do all these things start with an underscore? I don't think this matches our internal naming convention.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same questions about _sn_assert_use as in src/util/sharing_map.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now renamed all the macros to be all uppercase and have dropped the leading underscore.

bool no_sharing=false>
class sharing_nodet
template <typename baseT, typename dataT>
struct delete_through_base_ptrt

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a highly unusual thing. Please document why you need this.

}
else if(data.use_count() > 1)
{
auto p = std::static_pointer_cast<dataT>(data);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You assume that data is pointing to an instance of dataT here. Can you add an assertion for that?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please leave a comment explaining why you're using static rather than dynamic cast for this.


typedef const std::pair<const self_type &, const bool> const_find_type;
typedef const std::pair<self_type &, const bool> find_type;
_sn_type_par_decl class d_to_internalt : public d_baset

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's d? And to internal what?

@tautschnig
Copy link
Collaborator

Now that the pre-dependencies are merged, could this please be rebased before doing any reviews?

@danpoe danpoe force-pushed the refactor/sharing-map-small-nodes branch from a3f3f3a to d76cd52 Compare May 17, 2018 17:26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still have some questions, otherwise looks OK to me.


#ifdef SHARING_MAP_INTERNAL_CHECKS
#define _sm_assert(b) INVARIANT(b, "Sharing map internal invariant")
#define _sm_assert_use(v, b) _sm_assert(b)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for? Is this supposed to be a sort of [[maybe_unused]] for v? If so, why is it not used if SHARING_MAP_INTERNAL_CHECKS is on, and why not just (void)v?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is to avoid a compiler warning when the internal assertions are disabled (if for example r is only used in assert(r)). Yes, (void)v would also work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But ... is this actually used anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, it's not used anymore. (a similar define _sn_assert_use also exists in sharing_node.cpp where it is used)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can they be removed in that case? @danpoe

//#define _sn_assert(b)

#define _sn_type_par_decl \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same questions about _sn_assert_use as in src/util/sharing_map.h

@@ -43,13 +49,13 @@ void sharing_map_interface_test()
{
smt sm;

smt::const_find_type r1 = sm.insert(std::make_pair("i", "0"));
smt::const_find_type r1=sm.insert(std::make_pair("i", "0"));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are all these formatting changes? They seem to do the opposite of what clang-format wants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, not sure why or how I did this. I've now run clang-format over all the commits.

//n = 5000000;

//sharing_mapt<int, int> m;
std::map<int, int> m;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called sharing_map_memory_test when it inserts things into std::map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test was obsolete and I've now removed it.

@@ -345,3 +402,8 @@ TEST_CASE("Sharing map views")
{
sharing_map_view_test();
}

TEST_CASE("Sharing map memory")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case doesn't have any assertions in it? Should this really be a unit test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

// Internal nodes

class d_baset
class d_baset : public small_shared_two_way_pointeet<unsigned>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's "d"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This somewhat mimics the naming scheme in irept, where the shared pointer contained in an irept is named dt (I believe it stands for something like "data type").

@danpoe danpoe force-pushed the refactor/sharing-map-small-nodes branch 2 times, most recently from 3a08546 to 4662bdd Compare May 22, 2018 15:49
@@ -12,339 +12,455 @@ Author: Daniel Poetzl
#ifndef CPROVER_UTIL_SHARING_NODE_H
#define CPROVER_UTIL_SHARING_NODE_H

#include <list>
//#define DEBUG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed instead of commented out? I think it can be passed as a CLI compiler supplied symbol during compilation with -DDEBUG. Other than that the name is too generic - is the symbol DEBUG reserved across the whole of CBMC for debugging support like this, or is this code debugging support just specific to this code? If so, can it be renamed to something else if it can't be deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rename it to SN_DEBUG (the other macros in the sharing node file also start with SN).

#include "small_shared_ptr.h"
#include "small_shared_two_way_ptr.h"

//#define SN_INTERNAL_CHECKS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the above comment also applies to this as well? Can this be documented somewhere that it can be passed during compilation with -DSN_INTERNAL_CHECKS and removed from here? Otherwise it's confusing to a new reader, I believe.


typedef sharing_nodet<key_type, mapped_type, key_equal> self_type;
SN_TYPE_PAR_DECL class d_internalt : public d_baset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a question regarding the naming of this and similar classes. What does the d_ prefix symbolise? Is it just data? Or ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw now, that these are supposed to mimic the naming scheme of irept, with d symbolising data or data type or something along this lines. Understood, but I think this should be documented somewhere more prominently than an easily forgotten comment in a PR.


#ifdef SN_INTERNAL_CHECKS
#define SN_ASSERT(b) INVARIANT(b, "Sharing node internal invariant")
#define SN_ASSERT_USE(v, b) SN_ASSERT(b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a previous comment it has been asserted that these utilities are not being used. Do you think they might be useful in the future, and if not, can they be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were two similar macros, SM_ASSERT_USE and SN_ASSERT_USE (SM for sharing map and SN for sharing node). The former was unused and has now been removed, but the latter is still used.


#ifdef SHARING_MAP_INTERNAL_CHECKS
#define _sm_assert(b) INVARIANT(b, "Sharing map internal invariant")
#define _sm_assert_use(v, b) _sm_assert(b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can they be removed in that case? @danpoe

Copy link
Contributor

@NlightNFotis NlightNFotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor, non-blocking comments.

#define SHARING_MAPT(R) \
template <class keyT, class valueT, class hashT, class predT> \
R sharing_mapt<keyT, valueT, hashT, predT>
#define SHARING_MAPT(R) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use // clang-format off to avoid the ugly reformatting

return nullptr;

p=p->find_leaf(k);
const leaft *lp;
lp = cp->find_leaf(k);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure that separating declaration and initialisation improves readability here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll pull this method up into the class.

std::size_t key = hash()(k);
innert *ip = &map;

std::size_t bit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this declaration not inside the loop?

#include <stack>
#include <vector>
#include <stdexcept>
//#define SM_DEBUG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer removing commented code


#include "irep.h"
#include "sharing_node.h"
#include "threeval.h"

#define _sm_assert(b) assert(b)
//#define _sm_assert(b)
//#define SM_INTERNAL_CHECKS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer removing commented code

{
view.push_back(view_itemt(child.get_key(), child.get_value()));
const innert *i = &item.second;
stack.push({depth + 1, i});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You both push and pop on the stack within the do-while loop. Maybe comment on the property that ensures that the stack eventually becomes empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're essentially iterating over all the nodes in a tree with DFS. I'll add a comment to say that.

@danpoe danpoe force-pushed the refactor/sharing-map-small-nodes branch from d2ed841 to af2defd Compare May 23, 2018 15:59
@danpoe danpoe merged commit e5e0897 into diffblue:develop May 24, 2018
@danpoe danpoe deleted the refactor/sharing-map-small-nodes branch May 24, 2018 09:42
NathanJPhillips pushed a commit to NathanJPhillips/cbmc that referenced this pull request Aug 22, 2018
779fa71 Merge pull request diffblue#2253 from peterschrammel/documentation/override2
40ecff8 Merge pull request diffblue#2250 from tautschnig/expr-iterator-deque
050b344 Re-enable enforcement of override without virtual
b5dec9c Get legalistic about use of override without virtual
b51e2a8 Merge pull request diffblue#2196 from peterschrammel/check-module-includes
d5eabdf expr_iterator: use a std::deque to implement the stack
fada0af Add module dependency definition files
a90ea44 Add module dependency check to CPP-LINT
88f8cfc Remove unnecessary includes
d6986d8 Fix relative include paths
0f9c202 Merge pull request diffblue#2242 from diffblue/section-name-warning
b7f5886 Merge pull request diffblue#2241 from diffblue/ld_mode
5c11eb7 Merge pull request diffblue#2245 from mgudemann/fix/warning/clang_self_assign
f7e5fb5 Merge pull request diffblue#2229 from diffblue/ssize_t
1a504c9 Merge pull request diffblue#2244 from diffblue/solver-Makefile-fix
4bb1bf0 Fix clang++'s warning about self-assign
9a0aa9c Merge pull request diffblue#2235 from thomasspriggs/test-pl-colour
4c2cb3a remove linker mode from gcc_mode
303908f add separate path for ld
524091f factor out creation of hybrid binaries
b9127f3 linker_script_merget now takes exactly one ELF + goto binary
cd967db update year + add Michael
0d95cc5 missing const method qualifiers
6f04d98 fix ordering problem in solvers/Makefile
8f6bae0 remove a warning about section names
8befd02 Merge pull request diffblue#2238 from owen-jones-diffblue/owen-jones-diffblue/doc/irep_id
34b0ac6 Merge pull request diffblue#2236 from diffblue/show-class-hierarchy
8e8e450 Merge pull request diffblue#2232 from owen-jones-diffblue/owen-jones-diffblue/generic-bounded-types
01dc76b Add section on irep_idt and dstringt
2f4c6ad Add and unify --show-class-hierarchy command line option
56256f1 Minor typos in irept documentation
3cf4e3a Merge pull request diffblue#2178 from thomasspriggs/remove_java_bytecode_parse_treet_swap
1a7235d use __CPROVER_size_t and __CPROVER_ssize_t for __CPROVER_POINTER_OBJECT/OFFSET
a018e2f Add JSON output for class hierarchy
68c45ed Improve class hierarchy output
eeb732f Switch `push_back` to `emplace_back` when constructing `parse_trees`.
f154840 Delete copy constructor of `class java_bytecode_parse_treet`.
c5cbcec Fix instances where copying was being used instead of moving.
52a669f Remove `java_bytecode::swap` and return using `optionalt` instead.
fabbd04 Give up parsing generic method signature with bound
77f8162 Colour code tests passing vs failing.
e5e0897 Merge pull request diffblue#2126 from danpoe/refactor/sharing-map-small-nodes
f55bd96 Merge pull request diffblue#2231 from smowton/smowton/fix/jbmc-tests
af02973 Merge pull request diffblue#2202 from smowton/smowton/fix/java-lang-class-fields
42a78af JBMC tests: suffix logfiles when using symex-driven loading
af2defd Removed obsolete sharing map unit test
1d7fbd3 Refactor sharing map nodes to reduce memory consumption
5235938 Restore testing of jbmc
8a59f6f Object factory: permit null pointers within java.lang.Class
8412eb0 Merge pull request diffblue#2228 from peterschrammel/move-remaining-java-tests
369577a Move remaining java tests to jbmc/regression/
bfe3d3d Merge pull request diffblue#2226 from tautschnig/inline-get-str-cont
2b00973 Merge pull request diffblue#2227 from tautschnig/fptr-removal
3f7685f Merge pull request diffblue#2223 from diffblue/fp-builtins
3b3dc71 Distinct names of return-value symbols
4f7fade Cleanup: use symbolt::symbol_expr
8372862 function-pointer removal: Set the mode of a return symbol
272cde0 Inline get_string_container
72a0379 test __builtin_isinf, __builtin_isinf_sign, __builtin_isnormal
f156ef0 Merge pull request diffblue#2222 from tautschnig/attributes
a69c603 add __builtin_isnormal
83aeddd added __builtin_isinf_sign
87d467e fix return types of various __builtin_is* functions
61af061 added typecast_exprt::conditional_cast
e1b906a Support GCC's fallthrough attribute
d6d0a49 C front-end: support alias attributes on variables
376beab Merge pull request diffblue#2218 from diffblue/builtin_fpclassify
c3603e3 added a test for raw __builtin_fpclassify
52595bd add support for __builtin_fpclassify
50d1c79 Merge pull request diffblue#2214 from tautschnig/tg-only
3c59312 Remove unused substitute.{h,cpp}
d3e131c Revert "Set memory limit utility"
a4389fe Merge pull request diffblue#2210 from tautschnig/verbosity-cleanup
c250880 Merge pull request diffblue#2211 from tautschnig/travis-osx-cleanup
c8597a4 Merge pull request diffblue#2174 from romainbrenguier/bugfix/not_contains#TG-3150
b08ef94 Merge pull request diffblue#2216 from peterschrammel/update-codeowners
471ab0f Merge pull request diffblue#2207 from diffblue/remove-solvers-cvc
215cd69 Use enum entries instead of numeric values when comparing verbosity
6344b4f Warn if user-specified verbosity is out of range
bf04bcb Use a single implementation of eval_verbosity
b4731eb Do not redundantly set the message handler
42ec63a Clean up .gitignore
19200bf Update CODEOWNERS for /jbmc
0487376 Merge pull request diffblue#2173 from svorenova/gs_tg1121
6af2270 Update regression test that no longer throws an exception
bc17328 Enable previously failing regression tests
146bb29 Adding debug information to dereference type comparison
7b9a20a Allow pointers to be dereferenced to void types
108129c Merge pull request diffblue#2118 from diffblue/remove-jbmc
11411e4 Travis/OSX follow-up cleanup: remove unnecessary environment variables
386faa8 Test for String.contains and very large strings
9e73699 Refactor negation of not contains constraints
29a8818 Build jbmc on CI
f196e74 Update compilation instructions
1b7c84a Add JBMC README
03d6f5b Shorten goto-analyzer-taint-ansi-c tests to goto-analyzer-taint
8dc0d74 Remove obsolete jbmc-cover tests
f36da08 Move Java regression tests
b6742ca Move Java unit tests
e247458 Add JANALYZER tool
4588753 Add JDIFF tool
a20f2c1 Move java_bytecode, jbmc and miniz to jbmc/src
987106f Make unit test independent of java_bytecode
d945452 Adapt cpplint header guard check
28907b2 remove (pre-SMT-LIB) CVC interface

git-subtree-dir: cbmc
git-subtree-split: 779fa71
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants