Skip to content

Enable HASH_CODE by default to avoid repeated hash computation [blocks: #3486] #1992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 8, 2019

Conversation

tautschnig
Copy link
Collaborator

On SV-COMP benchmarks, hashing accounts for >22% of CPU time (with
profiling enabled).

Do not merge: profiling data to be supplied and further improvements to SUB_IS_LIST are in the queue.

@smowton
Copy link
Contributor

smowton commented Apr 9, 2018

@tautschnig could you explain why using std::list is related to hash caching? Or is that the reduce-dt-size bit (I guess from <start, end, capacity> for vector to <start, end> for list?)

@tautschnig
Copy link
Collaborator Author

Yes, it's the reduce-dt-size bit. But it's not about going from a vector to a list (which as of C++ is three pointers as well), but instead about going from a std::map (which in GNU STL is six pointers) to a list.

@smowton
Copy link
Contributor

smowton commented Apr 10, 2018

Ah, got you, I thought SUB_IS_LIST only affected sub, not named_sub / comments. If we're looking for space saving btw, I note comments is really quite rarely distinguished from named_sub, and by convention their names cannot clash. Should we combine them and filter when we actually want the distinction (during operator== only I think?)

@smowton
Copy link
Contributor

smowton commented Apr 10, 2018

In fact it appears SUB_IS_LIST doesn't affect sub at all! Please rename it.

@tautschnig tautschnig requested a review from pkesseli as a code owner November 24, 2018 19:09
@tautschnig tautschnig force-pushed the irep-speedup branch 2 times, most recently from ca5192a to e42bd66 Compare March 29, 2019 21:37
@tautschnig tautschnig added Needs data This PR claims improvements that require further data to substantiate the claims. and removed do not merge labels Mar 29, 2019
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: e42bd66).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/106400003
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

@tautschnig tautschnig changed the title [SV-COMP'18 3/19] Avoid repeated hash computation, reduce irept::dt size Enable HASH_CODE by default to avoid repeated hash computation Mar 30, 2019
@tautschnig
Copy link
Collaborator Author

@peterschrammel @allredj Any idea why this is failing TG builds?

@tautschnig
Copy link
Collaborator Author

@peterschrammel @allredj Any idea why this is failing TG builds?

Never mind, @smowton confirmed that TG already builds with -DHASH_CODE. Will discuss on Monday whether to use #ifndef or just ask TG to remove that option from the build command line.

@smowton
Copy link
Contributor

smowton commented Mar 30, 2019

I favour #ifndef for all such options as that allows -DHASH_CODE=0 on the command-line to make a build without it

@tautschnig
Copy link
Collaborator Author

I favour #ifndef for all such options as that allows -DHASH_CODE=0 on the command-line to make a build without it

Fair point, will include this change in the next push to this branch.

@tautschnig tautschnig changed the title Enable HASH_CODE by default to avoid repeated hash computation Enable HASH_CODE by default to avoid repeated hash computation [blocks: #3486] Mar 30, 2019
@tautschnig tautschnig removed the Needs data This PR claims improvements that require further data to substantiate the claims. label Apr 1, 2019
@tautschnig
Copy link
Collaborator Author

Commit message updated to include data points:

On some SV-COMP benchmark categories, hashing accounts for >20% of CPU
time (with profiling enabled) - top five:

  • ReachSafety-BitVectors: 29.29% (470.54 seconds, which reduces to 4.39
    seconds; for benchmarks not timing out we save 170 seconds (25%) in
    non-profiling mode)
  • Systems_BusyBox_NoOverflows: 27.98% (284.15 seconds, which reduces to
    1.74 seconds; for the 1 benchmark not timing out we save 23 seconds
    (6%) in non-profiling mode)
  • Systems_BusyBox_MemSafety: 24.24% (194.74 seconds, which reduces to
    0.93 seconds; no measurable difference on the 2 benchmarks not
    failing/timing out)
  • NoOverflows-Other: 18.84% (1127.61 seconds, which reduces to
    23.57 seconds; for benchmarks not timing out we save 5 seconds (7%) in
    non-profiling mode)
  • ReachSafety-ControlFlow: 17.75% (1194.04 seconds, which reduces to
    29.17 seconds; for benchmarks not timing out we save 200 seconds (25%)
    in non-profiling mode)

For ReachSafety-ECA it's only 4.7%, which still amounts to 3006.7
seconds. With this change this reduces to 323.07 seconds. On
ReachSafety-ECA, this enables 3055.35 symex_step calls per second over
2752.28 calls per second without this change.

@tautschnig tautschnig assigned kroening and unassigned tautschnig Apr 1, 2019
Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HASH_CODE is one of those things we probably should have been doing for some time. If it doesn't work / has performance regressions then we should fix it and I think it should be fixable.

Semi-related : given all of the work on tidying up the constructors for exprts and the like. How far are we from making dts immutable?

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 63081a5).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/106523249

Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your #ifndef actually doesn't permit manual override -- you need to do like

#ifndef HASH_CODE
 #  define HASH_CODE 1
 #endif

...

#if HASH_CODE == 1

At the moment since all the tests just use #ifdef there is in practice no way to override with -D

@tautschnig
Copy link
Collaborator Author

Your #ifndef actually doesn't permit manual override [...]

Good catch - that's now fixed!

@tautschnig
Copy link
Collaborator Author

Marking do-not-merge until we have bisected a possible performance regression and also gathered feedback on this change from customers.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: abcb3c3).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/108705935
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

@tautschnig
Copy link
Collaborator Author

Customer feedback indicates that the degree of performance improvement may vary, but there are no red flags. Will merge as once CI completes its work.

On some SV-COMP benchmark categories, hashing accounts for >20% of CPU
time (with profiling enabled) - top five:

* ReachSafety-BitVectors: 29.29% (470.54 seconds, which reduces to 4.39
  seconds; for benchmarks not timing out we save 170 seconds (25%) in
  non-profiling mode)
* Systems_BusyBox_NoOverflows: 27.98% (284.15 seconds, which reduces to
  1.74 seconds; for the 1 benchmark not timing out we save 23 seconds
  (6%) in non-profiling mode)
* Systems_BusyBox_MemSafety: 24.24% (194.74 seconds, which reduces to
  0.93 seconds; no measurable difference on the 2 benchmarks not
  failing/timing out)
* NoOverflows-Other: 18.84% (1127.61 seconds, which reduces to
  23.57 seconds; for benchmarks not timing out we save 5 seconds (7%) in
  non-profiling mode)
* ReachSafety-ControlFlow: 17.75% (1194.04 seconds, which reduces to
  29.17 seconds; for benchmarks not timing out we save 200 seconds (25%)
  in non-profiling mode)

For ReachSafety-ECA it's only 4.7%, which still amounts to 3006.7
seconds. With this change this reduces to 323.07 seconds. On
ReachSafety-ECA, this enables 3055.35 symex_step calls per second over
2752.28 calls per second without this change.
@tautschnig tautschnig merged commit fd79fe0 into diffblue:develop May 8, 2019
@tautschnig tautschnig deleted the irep-speedup branch May 8, 2019 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants