Skip to content

Move instead of merge symex L2 renaming map when possible #4199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

JohnDumbell
Copy link
Contributor

@JohnDumbell JohnDumbell commented Feb 15, 2019

This PR splits the level 2 renaming map into two: a global map whose state is kept across all symex runs, and a local one that only holds the generation counts of something along a particular branch (or specifically, a statet).

It works like this: on reading the current generation, the local map will always be used. When a generation is increased, we query the global map for the currently-taken generation number, increment that by one, then assign the local maps generation number as that. With this, we don't have to merge maps at each branch if for some reason a particular branch can't be taken, as generation numbers will always be unique. Downside is that generation numbers might look strange if symex has many paths to take.

Example of this: path symex, previously, would have taken the next free number on that particular path. With the global name map this is actually now the next free number across all paths. So you could see:

name#1
name#2
name#54

Which might throw people off. You fix this by making the global name map only 'global' to an individual run of symex, but for now there shouldn't be any problems, and the change is minuscule anyway.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@smowton
Copy link
Contributor

smowton commented Feb 18, 2019

@tautschnig parts of this are pretty similar to #3980 and its separation of unique name allocation from current-name tracking -- just a heads up in case there's a danger of duplicating work, and @JohnDumbell you might want to read that PR to compare and try to align your naming and ownership of data structures with @tautschnig's choices.

INVARIANT(
!goto_state.guard.is_false(),
"an unreachable state should never have been queued for merging");

if(dest_state.guard.is_false())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should go away considering the invariants


/// In the case of a state split, we want to keep track of the overall
/// name generation across all states for identifier creation.
optionalt<current_namest> local_generations;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're using the existing current_names to track global-highest-generation and this new local_generations to track the latest version on a particular path? If so, see #3980 -- that creates unique_index_map to store a global highest L1 generation, and that map belongs to path_storaget, not symex_level2t which is path-local. (as a rule: varies per path -> goto_statet, varies per path but doesn't need to be copied on split -> goto_symex_statet, doesn't vary per path -> path_storaget). So you should have a map unique_l2_index_map or something on path_storaget used to allocate fresh L2 names, and current_names should continue to describe what L2 generation is observed by a read on a particular path.

/// If we have a local counter, use that instead of our global.
unsigned current_count(const irep_idt &identifier) const override
{
if(local_generations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be simpler -- the current count (i.e. the generation observed by a read) should always just be path-local-current-L2-names.at(identifier)


void erase(const irep_idt &identifier)
{
current_names.erase(identifier);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given current_names is the L2 global unique generation, and the erase method is path-local, I think this is dubious.

@JohnDumbell
Copy link
Contributor Author

Sorry for the mass review auto-add. Feel free to remove yourself (as I can't), a push accidentally hoovered up loads of commits I didn't intend.

@smowton
Copy link
Contributor

smowton commented Feb 18, 2019

I can't remove them either I'm afraid. Can kill the PR and remake, or just rely on them filtering the thread out as most reviewers do 99% of review requests ;)

@tautschnig
Copy link
Collaborator

There is certainly overlap with #3980, but also #4169. While those are awaiting reviews I'd be happy to factor out any changes that may independently be useful. Otherwise we should agree on a sequence of which PRs are suppose to go in on top of which other and then just start stacking up the commits appropriately in the PRs.

@JohnDumbell
Copy link
Contributor Author

@tautschnig Thanks, though as the other two PR's are further along the easiest solution is probably for me to just build off those two and/or wait for them to go in first. I'll cherry-pick anything directly useful, otherwise I think my changes will sit alongside them.

@tautschnig
Copy link
Collaborator

@JohnDumbell Sounds good - just let me know when this PR is ready for reviews/you'd like comments.

@JohnDumbell JohnDumbell force-pushed the jd/enhancement/symex_state_moving branch from e1f19dd to f0a9991 Compare February 21, 2019 17:23
@tautschnig
Copy link
Collaborator

Given that #3980 has been merged, is there any chance to actually re-use the path_storaget::get_unique_index that it introduced?

@JohnDumbell
Copy link
Contributor Author

Sure, I'll have a look and see how it can fit in

@smowton
Copy link
Contributor

smowton commented Mar 7, 2019

@tautschnig candidate branch that styles this more like your L1-naming patch here: develop...smowton:jd/enhancement/symex_state_moving

Note I added one more commit to protect level2, on grounds that making up a new name by just directly writing to it is now dangerous (you must use the proper allocator). You might want to do something similar to the L1 map for similar reasons.

@smowton
Copy link
Contributor

smowton commented Mar 8, 2019

@tautschnig behold: data exhibiting the branch above doing its thing!

Test program:

public class Test {

    public static void main(int unknown) {

	// Step 1: allocate thousands of objects, so the level2
	// name map / constant propagator are quite large
	for(int i = 0; i < 1000; i++) {
	    Object o = new Object();
	}

	// Step 2: Make a few branches in a context which isn't
	// certain to run (i.e. where the state guard is non-trivial)
	if(unknown % 5 == 0) {
	    int x = 0;
	    for(int j = 0; j < 1000; j++) {
		if(x == 0)
		    x = 0;
		else
		    x = 1;
	    }
	}
       
    }

}

With develop: 5.3s
With this change: 0.8s

The reason: the if(x == 0) in the step 2 loop is decided statically, and at the end of the if block we unconditionally jump to the end of the loop body. Because the state guard isn't true we suppose there might be a route around this code (there is, but it's outside the loop). Therefore we (move, with this PR | copy, without this PR) the state to the end-of-loop instruction (the unconditional back-edge, in fact) and walk over each intermediate instruction checking there are no incoming edges. Similarly when we get to that back-edge we (copy | move) the state from the goto_state_map.

Conclusion: the 2000x copies of an L2 name map containing around 1000 objects takes around 4s longer than 2000x moves. I expect this to provide a handsome speedup when symexing code which is largely constant (so branches are statically decided) in a context where there are thousands of values in the L2 name map / constant propagator.

@JohnDumbell JohnDumbell force-pushed the jd/enhancement/symex_state_moving branch from d145bda to fe30231 Compare March 8, 2019 19:10
@tautschnig
Copy link
Collaborator

@JohnDumbell @smowton Nice results! Now we just need to get this PR in a shape where CI passes :-)

@JohnDumbell JohnDumbell force-pushed the jd/enhancement/symex_state_moving branch 4 times, most recently from 94327d0 to b09cd64 Compare March 12, 2019 13:24
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: b09cd64).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/104080211

@JohnDumbell JohnDumbell force-pushed the jd/enhancement/symex_state_moving branch from b09cd64 to 440eb97 Compare March 12, 2019 17:50
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 440eb97).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/104131701

@tautschnig
Copy link
Collaborator

Unless you have decent access to a Windows development environment, my (rather simplistic) strategy to debug the segmentation fault on Windows would be to push reverts of subsets of the commits to identify which of the commits is causing the problem.

if(new_guard.is_true())
{
goto_state_list.emplace_back(state.source, std::move(state));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely comment here that that std::move is moving our goto_statet superclass, rather than the whole of state

{
if(state.guard.is_false())
{
state.move_from(std::move(goto_state));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now replacable by the default move-assign op? (Again comment that this is our superclass being moved)

Copy link
Contributor Author

@JohnDumbell JohnDumbell Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda, see f5ae411 for one way to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good to me -- it'd be cleaner if it was a member not a superclass, but otherwise as hoped for.

@JohnDumbell JohnDumbell force-pushed the jd/enhancement/symex_state_moving branch from 9320d1a to f5ae411 Compare March 13, 2019 13:09
const ssa_exprt &lhs)
{
auto current_emplace_res =
level2.current_names.emplace(l1_identifier, std::make_pair(lhs, 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this method belongs to symex_level2t. :leaves:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was considered but not done because we didn't want symex_level2t concerned with the global name allocator (only with the path-local map)

{
// If we can't find the name in the local scope, this is a no-op.
auto current_names_iter = level2.current_names.find(identifier);
if(current_names_iter == level2.current_names.end())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:leaves:

private:
std::function<std::size_t(const irep_idt &)> fresh_l2_name_provider;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could maybe be part of symex_level2t. I'm a bit worried when I see goto_symex_statet grow, as this class already contains too many thing and does too much to my taste.

name_index_mapt &unique_index_map,
const irep_idt &id,
std::size_t minimum_index)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definition should go to the cpp file. Also I would suggest declaring a class for name_index_mapt (instead of a typedef) and make this the only public method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems short and hot enough to be worth inlining to me

goto_state_list.emplace_back(state.source, state);

// If the guard is true (and the alternative branch unreachable) we can move
// the state in this case as it'll never be accessed on the alternate branch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in this case" seems redundant in this sentence

@@ -28,8 +28,15 @@ class goto_statet
/// Distance from entry
unsigned depth = 0;

protected:
symex_level2t level2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about making current_names private in symex levels instead? (or both)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both sounds reasonable to me, but suggest doing further refactoring elsewhere, this PR has already dragged on a fair while

@JohnDumbell JohnDumbell force-pushed the jd/enhancement/symex_state_moving branch 2 times, most recently from 96284f4 to 2c1baee Compare March 15, 2019 13:06
JohnDumbell and others added 4 commits March 15, 2019 13:21
This global name map will be used to check what generation is currently
available for each level1 name, and shared across all states. This will
only be used when a particular state tries to find out what the next
free generation is.

The main draw of this is that all branches will now assign unique
generations that won't clash with assignments done across other
branches. One minor downside is that in VCC's the generation number
might jump sporadically (say from diffblue#4 to diffblue#40).
In the situation of merge on a state that hasn't been traversed (with a
false guard) try and move the names into the destination state instead
of a merge or copy operation
This means that the symex_transition call is duplicated, but better
that one call than multiple if statements.
We really should protect the level 2 name map to prevent "allocating" a new generation
by just making one up in the local name map, cf. applying for one properly using
increase_generation.
@JohnDumbell JohnDumbell force-pushed the jd/enhancement/symex_state_moving branch from 2c1baee to 939a169 Compare March 15, 2019 13:22
@smowton smowton merged commit 7dadeee into diffblue:develop Mar 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants