-
Notifications
You must be signed in to change notification settings - Fork 274
Move instead of merge symex L2 renaming map when possible #4199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move instead of merge symex L2 renaming map when possible #4199
Conversation
473ed43
to
483435a
Compare
@tautschnig parts of this are pretty similar to #3980 and its separation of unique name allocation from current-name tracking -- just a heads up in case there's a danger of duplicating work, and @JohnDumbell you might want to read that PR to compare and try to align your naming and ownership of data structures with @tautschnig's choices. |
src/goto-symex/symex_goto.cpp
Outdated
INVARIANT( | ||
!goto_state.guard.is_false(), | ||
"an unreachable state should never have been queued for merging"); | ||
|
||
if(dest_state.guard.is_false()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should go away considering the invariants
src/goto-symex/renaming_level.h
Outdated
|
||
/// In the case of a state split, we want to keep track of the overall | ||
/// name generation across all states for identifier creation. | ||
optionalt<current_namest> local_generations; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you're using the existing current_names
to track global-highest-generation and this new local_generations
to track the latest version on a particular path? If so, see #3980 -- that creates unique_index_map
to store a global highest L1 generation, and that map belongs to path_storaget
, not symex_level2t
which is path-local. (as a rule: varies per path -> goto_statet
, varies per path but doesn't need to be copied on split -> goto_symex_statet
, doesn't vary per path -> path_storaget
). So you should have a map unique_l2_index_map
or something on path_storaget
used to allocate fresh L2 names, and current_names
should continue to describe what L2 generation is observed by a read on a particular path.
src/goto-symex/renaming_level.h
Outdated
/// If we have a local counter, use that instead of our global. | ||
unsigned current_count(const irep_idt &identifier) const override | ||
{ | ||
if(local_generations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be simpler -- the current count (i.e. the generation observed by a read) should always just be path-local-current-L2-names.at(identifier)
src/goto-symex/renaming_level.h
Outdated
|
||
void erase(const irep_idt &identifier) | ||
{ | ||
current_names.erase(identifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given current_names
is the L2 global unique generation, and the erase
method is path-local, I think this is dubious.
483435a
to
8791e6d
Compare
8791e6d
to
e1f19dd
Compare
Sorry for the mass review auto-add. Feel free to remove yourself (as I can't), a push accidentally hoovered up loads of commits I didn't intend. |
I can't remove them either I'm afraid. Can kill the PR and remake, or just rely on them filtering the thread out as most reviewers do 99% of review requests ;) |
There is certainly overlap with #3980, but also #4169. While those are awaiting reviews I'd be happy to factor out any changes that may independently be useful. Otherwise we should agree on a sequence of which PRs are suppose to go in on top of which other and then just start stacking up the commits appropriately in the PRs. |
@tautschnig Thanks, though as the other two PR's are further along the easiest solution is probably for me to just build off those two and/or wait for them to go in first. I'll cherry-pick anything directly useful, otherwise I think my changes will sit alongside them. |
@JohnDumbell Sounds good - just let me know when this PR is ready for reviews/you'd like comments. |
e1f19dd
to
f0a9991
Compare
Given that #3980 has been merged, is there any chance to actually re-use the |
Sure, I'll have a look and see how it can fit in |
@tautschnig candidate branch that styles this more like your L1-naming patch here: develop...smowton:jd/enhancement/symex_state_moving Note I added one more commit to protect |
@tautschnig behold: data exhibiting the branch above doing its thing! Test program: public class Test {
public static void main(int unknown) {
// Step 1: allocate thousands of objects, so the level2
// name map / constant propagator are quite large
for(int i = 0; i < 1000; i++) {
Object o = new Object();
}
// Step 2: Make a few branches in a context which isn't
// certain to run (i.e. where the state guard is non-trivial)
if(unknown % 5 == 0) {
int x = 0;
for(int j = 0; j < 1000; j++) {
if(x == 0)
x = 0;
else
x = 1;
}
}
}
} With develop: 5.3s The reason: the Conclusion: the 2000x copies of an L2 name map containing around 1000 objects takes around 4s longer than 2000x moves. I expect this to provide a handsome speedup when symexing code which is largely constant (so branches are statically decided) in a context where there are thousands of values in the L2 name map / constant propagator. |
d145bda
to
fe30231
Compare
@JohnDumbell @smowton Nice results! Now we just need to get this PR in a shape where CI passes :-) |
94327d0
to
b09cd64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: b09cd64).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/104080211
b09cd64
to
440eb97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
Passed Diffblue compatibility checks (cbmc commit: 440eb97).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/104131701
Unless you have decent access to a Windows development environment, my (rather simplistic) strategy to debug the segmentation fault on Windows would be to push reverts of subsets of the commits to identify which of the commits is causing the problem. |
if(new_guard.is_true()) | ||
{ | ||
goto_state_list.emplace_back(state.source, std::move(state)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely comment here that that std::move
is moving our goto_statet
superclass, rather than the whole of state
src/goto-symex/symex_goto.cpp
Outdated
{ | ||
if(state.guard.is_false()) | ||
{ | ||
state.move_from(std::move(goto_state)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is now replacable by the default move-assign op? (Again comment that this is our superclass being moved)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda, see f5ae411 for one way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks good to me -- it'd be cleaner if it was a member not a superclass, but otherwise as hoped for.
9320d1a
to
f5ae411
Compare
const ssa_exprt &lhs) | ||
{ | ||
auto current_emplace_res = | ||
level2.current_names.emplace(l1_identifier, std::make_pair(lhs, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this method belongs to symex_level2t. :leaves:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was considered but not done because we didn't want symex_level2t
concerned with the global name allocator (only with the path-local map)
{ | ||
// If we can't find the name in the local scope, this is a no-op. | ||
auto current_names_iter = level2.current_names.find(identifier); | ||
if(current_names_iter == level2.current_names.end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:leaves:
private: | ||
std::function<std::size_t(const irep_idt &)> fresh_l2_name_provider; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could maybe be part of symex_level2t
. I'm a bit worried when I see goto_symex_statet
grow, as this class already contains too many thing and does too much to my taste.
name_index_mapt &unique_index_map, | ||
const irep_idt &id, | ||
std::size_t minimum_index) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definition should go to the cpp file. Also I would suggest declaring a class for name_index_mapt
(instead of a typedef) and make this the only public method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems short and hot enough to be worth inlining to me
src/goto-symex/symex_goto.cpp
Outdated
goto_state_list.emplace_back(state.source, state); | ||
|
||
// If the guard is true (and the alternative branch unreachable) we can move | ||
// the state in this case as it'll never be accessed on the alternate branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in this case" seems redundant in this sentence
@@ -28,8 +28,15 @@ class goto_statet | |||
/// Distance from entry | |||
unsigned depth = 0; | |||
|
|||
protected: | |||
symex_level2t level2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about making current_names private in symex levels instead? (or both)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both sounds reasonable to me, but suggest doing further refactoring elsewhere, this PR has already dragged on a fair while
96284f4
to
2c1baee
Compare
This global name map will be used to check what generation is currently available for each level1 name, and shared across all states. This will only be used when a particular state tries to find out what the next free generation is. The main draw of this is that all branches will now assign unique generations that won't clash with assignments done across other branches. One minor downside is that in VCC's the generation number might jump sporadically (say from diffblue#4 to diffblue#40).
In the situation of merge on a state that hasn't been traversed (with a false guard) try and move the names into the destination state instead of a merge or copy operation
This means that the symex_transition call is duplicated, but better that one call than multiple if statements.
We really should protect the level 2 name map to prevent "allocating" a new generation by just making one up in the local name map, cf. applying for one properly using increase_generation.
2c1baee
to
939a169
Compare
This PR splits the level 2 renaming map into two: a global map whose state is kept across all symex runs, and a local one that only holds the generation counts of something along a particular branch (or specifically, a statet).
It works like this: on reading the current generation, the local map will always be used. When a generation is increased, we query the global map for the currently-taken generation number, increment that by one, then assign the local maps generation number as that. With this, we don't have to merge maps at each branch if for some reason a particular branch can't be taken, as generation numbers will always be unique. Downside is that generation numbers might look strange if symex has many paths to take.
Example of this: path symex, previously, would have taken the next free number on that particular path. With the global name map this is actually now the next free number across all paths. So you could see:
name#1
name#2
name#54
Which might throw people off. You fix this by making the global name map only 'global' to an individual run of symex, but for now there shouldn't be any problems, and the change is minuscule anyway.