Restore named_sub map entries ordering when reading from a goto binary. #7534

remi-delmas-3000 · 2023-02-09T20:08:41Z

The named_subt is a map from irep_idt keys to irept values implemented using a forward list. Entries must be stored in increasing key order in the list, otherwise lookups will fail. The irep_idt ordering is defined by how string interning numbers strings in CBMC.

When a goto binary is produced by a different tool the named_sub ordering in the binary file may not necessarily match the one expected by CBMC. We use irept::named_subt::add instead of irept::named_subt::emplace_after to build the named_sub when deserialising so that so that proper ordering is guaranteed.

If the performance impact is considered acceptable, this PR would make the binary format independent of the internal numbering of strings used in CBMC, which in turn will allow us to produce goto binaries directly from Kani instead of using the JSON + symtab2gb path.

I measured the performance impact on read_bin_goto_object on a collection of goto binaries in ranging in size from a hundreds of kb to 8Mb. Overall reordering adds a measurable 2-5% overhead on read_bin_goto_object.

For kb-sized goto binaries the overhead is ~1 millisecond

- baseline Runtime read_bin_goto_object: 0.025…s
- reordering Runtime read_bin_goto_object: 0.026…s

- baseline Runtime read_bin_goto_object: 0.031…s
- reordering Runtime read_bin_goto_object: 0.032…s

For the 8Mb binary the baseline is:

ubuntu@ip-172-31-23-10:~/tmp$ time goto-cc a.out -o b.out
Runtime read_bin_goto_object: 0.316...s

real    0m1.427s
user    0m1.337s
sys     0m0.077s

with reordering we see an extra ~8ms spent on loading the binary.

ubuntu@ip-172-31-23-10:~/tmp$ time goto-cc a.out -o b.out
Runtime read_bin_goto_object: 0.322...s

real    0m1.433s
user    0m1.332s
sys     0m0.091s

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
White-space or formatting changes outside the feature-related changed lines are in commits of their own.

codecov · 2023-02-09T20:57:18Z

Codecov Report

Base: 78.49% // Head: 78.49% // Decreases project coverage by -0.01% ⚠️

Coverage data is based on head (c52c0f6) compared to base (fbd2b88).
Patch coverage: 90.00% of modified lines in pull request are covered.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #7534      +/-   ##
===========================================
- Coverage    78.49%   78.49%   -0.01%     
===========================================
  Files         1663     1667       +4     
  Lines       191336   191437     +101     
===========================================
+ Hits        150182   150261      +79     
- Misses       41154    41176      +22

Impacted Files	Coverage Δ
...-sensitivity/value_set_pointer_abstract_object.cpp	`92.18% <ø> (ø)`
src/cprover/generalization.cpp	`84.21% <ø> (ø)`
src/cprover/inductiveness.cpp	`78.35% <ø> (ø)`
src/cprover/state_encoding.cpp	`80.57% <ø> (ø)`
...oto-instrument/contracts/instrument_spec_assigns.h	`94.73% <ø> (ø)`
src/goto-instrument/unwindset.cpp	`85.43% <ø> (ø)`
src/solvers/flattening/boolbv_bitreverse.cpp	`100.00% <ø> (ø)`
src/solvers/flattening/boolbv_get.cpp	`83.25% <ø> (ø)`
src/solvers/smt2_incremental/ast/smt_commands.cpp	`88.03% <ø> (ø)`
...c/solvers/smt2_incremental/convert_expr_to_smt.cpp	`89.66% <ø> (ø)`
... and 33 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

tautschnig · 2023-02-10T09:41:55Z

The performance impact seems negligible.

We need to collect some numbers, perhaps on the GOTO binaries produced by Kani for those tend to be pretty large. I'd suggest to measure the time that goto-cc binary -o binary.2 takes before and after this patch.

kroening · 2023-02-10T11:50:04Z

We do not wish to guarantee an ordering of the named_sub_map entries.

thomasspriggs · 2023-02-10T13:07:15Z

We do not wish to guarantee an ordering of the named_sub_map entries.

The current implementation of forward_list_as_mapt::find requires the underlying list to be ordered. This results in a hidden requirement for the serialised form to be ordered. The requirement stems from the lookup being implemented using std::lower_bound, which has ordering/partitioning requirements.

The alternatives to sorting on file read, I can think of would include -

Generate an error on reading an unordered file.
Update forward_list_as_map so that the lookup functions for an unordered underlying collection.

Option 1 is just moving the requirement for the collection to be ordered around. Option 2 potentially moves the performance costs around and doesn't deal with any further hidden requirements on order in code using irept::get_named_sub().

tautschnig · 2023-02-13T14:38:01Z

Could you please add an emplace method to forward_list_as_mapt so that we don't even need to have this #if NAMED_SUB_IS_FORWARD_LIST in there anymore? This emplace should be the same as the existing add, except both parameters should be rvalue references. This would then fully hide the need for sortedness, and make the ordering on named_subt an implementation detail.

thomasspriggs · 2023-02-13T15:45:00Z

Which ever solution we come to for this issue, it would be good to have a unit test on reading unordered named_subs. That would mitigate the risk of any future optimisation efforts reintroducing the same issue.

remi-delmas-3000 · 2023-02-13T16:00:08Z

Which ever solution we come to for this issue, it would be good to have a unit test on reading unordered named_subs. That would mitigate the risk of any future optimisation efforts reintroducing the same issue.

I can provide a goto binary that does not satisfy the implicit ordering requirement. Would that be ok ?

remi-delmas-3000 · 2023-02-13T16:02:01Z

Could you please add an emplace method to forward_list_as_mapt so that we don't even need to have this #if NAMED_SUB_IS_FORWARD_LIST in there anymore? This emplace should be the same as the existing add, except both parameters should be rvalue references. This would then fully hide the need for sortedness, and make the ordering on named_subt an implementation detail.

This is indeed better

tautschnig · 2023-02-13T16:20:59Z

Which ever solution we come to for this issue, it would be good to have a unit test on reading unordered named_subs. That would mitigate the risk of any future optimisation efforts reintroducing the same issue.

I can provide a goto binary that does not satisfy the implicit ordering requirement. Would that be ok ?

If so we’d need a way to build such a binary: as GOTO binary versions evolve we’ll still need to be able to run this test.

thomasspriggs · 2023-02-13T17:00:50Z

If so we’d need a way to build such a binary: as GOTO binary versions evolve we’ll still need to be able to run this test.

A unit test could potentially be written to cover irep_serializationt only, by using arbitrary ireps rather than a full GOTO binary. It can also be done in-memory rather than using a pre-prepared file, by using std::stringstream instead of actual files.

…ORWARD_LIST == 0`

…sible

remi-delmas-3000 · 2023-02-14T18:53:39Z

Update: I implemented @tautschnig suggestion by adding an forward_list_as_mapt::emplace method to that does an order-preserving insertion, getting rid of the #ifdef conditions that distinguished between std::map and forward_list_as_map where possible.

I also renamed forward_list_as_mapt::remove to forward_list_as_mapt::erase in order to match std::map::remove and got rid of more #ifdefs.

Last I changed to using forward_list_as_mapt::size where possible to clean up some #ifdefs.

The only place that still assumes an ordering for the named_sub map is in irep_merge.cpp, but since this is an internal function (as opposed to a function that loads irep coming from outside of CBMC), I think we can keep it as is.

The named_sub ordering assumption is essentially removed for GOTO binaries.

thomasspriggs

I am approving as I believe the main files affected by this PR, for which I am code owner are the SMT backend and that adding in the missing include of <algorithm> is a perfectly reasonable change to make for these.

I would prefer to see a test of some kind included before this PR is merged. I have the feeling that better performance characteristics for loading may be achieved using an emplace_hint style interface. But without the test we can't be certain that potential optimisations do not break your fix.

martin-cs

A belated approve of adding the headers. I'm not quite sure I understand why they are needed but they are not offensive.

thomasspriggs · 2023-02-15T15:23:03Z

A belated approve of adding the headers. I'm not quite sure I understand why they are needed but they are not offensive.

forward_list_as_map.h has #include <algorithm> in it. This adds a transitive include of algorithm to most of the code base via irep.h. However if any file relies on this transitive include rather than including algorithm where it is used, then the build breaks when building using std::map as the implementation of named_subt. This is because this build option does not include forward_list_as_map.h and so in turn does not have the transitive include of algorithm.

martin-cs · 2023-02-15T15:41:03Z

Thanks of the explanation @thomasspriggs .

remi-delmas-3000 requested review from kroening, tautschnig and peterschrammel as code owners February 9, 2023 20:08

remi-delmas-3000 self-assigned this Feb 9, 2023

remi-delmas-3000 added aws Bugs or features of importance to AWS CBMC users aws-high labels Feb 9, 2023

remi-delmas-3000 mentioned this pull request Feb 9, 2023

Goto binary serialization model-checking/kani#2205

Merged

4 tasks

remi-delmas-3000 added the Kani Bugs or features of importance to Kani Rust Verifier label Feb 10, 2023

remi-delmas-3000 force-pushed the read-irep-ensure-named-sub-ordering branch from d08274e to b905c4f Compare February 10, 2023 16:31

Remi Delmas added 4 commits February 14, 2023 15:55

Include <algorithm> where needed when building with `NAMED_SUB_IS_F…

7523caf

…ORWARD_LIST == 0`

Add emplace method to forward_list_as_mapt and simplify where pos…

487c28c

…sible

Simplify use the size method instead of distance

7cbee89

Rename remove to erase to allow simplifications

c52c0f6

remi-delmas-3000 force-pushed the read-irep-ensure-named-sub-ordering branch from b905c4f to c52c0f6 Compare February 14, 2023 16:01

remi-delmas-3000 requested review from thomasspriggs, NlightNFotis, TGWDB, chris-ryder, feliperodri and martin-cs as code owners February 14, 2023 16:01

tautschnig approved these changes Feb 14, 2023

View reviewed changes

tautschnig assigned NlightNFotis, peterschrammel, martin-cs, esteffin, thomasspriggs and TGWDB and unassigned remi-delmas-3000 Feb 14, 2023

feliperodri approved these changes Feb 14, 2023

View reviewed changes

thomasspriggs approved these changes Feb 15, 2023

View reviewed changes

peterschrammel approved these changes Feb 15, 2023

View reviewed changes

peterschrammel removed their assignment Feb 15, 2023

tautschnig merged commit 5678ed6 into diffblue:develop Feb 15, 2023

martin-cs reviewed Feb 15, 2023

View reviewed changes

remi-delmas-3000 deleted the read-irep-ensure-named-sub-ordering branch February 15, 2023 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Restore named_sub map entries ordering when reading from a goto binary. #7534

Restore named_sub map entries ordering when reading from a goto binary. #7534

Uh oh!

remi-delmas-3000 commented Feb 9, 2023 •

edited

Loading

Uh oh!

codecov bot commented Feb 9, 2023 •

edited

Loading

Uh oh!

tautschnig commented Feb 10, 2023

Uh oh!

kroening commented Feb 10, 2023

Uh oh!

thomasspriggs commented Feb 10, 2023

Uh oh!

tautschnig commented Feb 13, 2023

Uh oh!

thomasspriggs commented Feb 13, 2023

Uh oh!

remi-delmas-3000 commented Feb 13, 2023

Uh oh!

remi-delmas-3000 commented Feb 13, 2023

Uh oh!

tautschnig commented Feb 13, 2023

Uh oh!

thomasspriggs commented Feb 13, 2023

Uh oh!

remi-delmas-3000 commented Feb 14, 2023

Uh oh!

thomasspriggs left a comment

Uh oh!

martin-cs left a comment

Uh oh!

thomasspriggs commented Feb 15, 2023

Uh oh!

martin-cs commented Feb 15, 2023

Uh oh!

Uh oh!

Restore named_sub map entries ordering when reading from a goto binary. #7534

Restore named_sub map entries ordering when reading from a goto binary. #7534

Uh oh!

Conversation

remi-delmas-3000 commented Feb 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tautschnig commented Feb 10, 2023

Uh oh!

kroening commented Feb 10, 2023

Uh oh!

thomasspriggs commented Feb 10, 2023

Uh oh!

tautschnig commented Feb 13, 2023

Uh oh!

thomasspriggs commented Feb 13, 2023

Uh oh!

remi-delmas-3000 commented Feb 13, 2023

Uh oh!

remi-delmas-3000 commented Feb 13, 2023

Uh oh!

tautschnig commented Feb 13, 2023

Uh oh!

thomasspriggs commented Feb 13, 2023

Uh oh!

remi-delmas-3000 commented Feb 14, 2023

Uh oh!

thomasspriggs left a comment

Choose a reason for hiding this comment

Uh oh!

martin-cs left a comment

Choose a reason for hiding this comment

Uh oh!

thomasspriggs commented Feb 15, 2023

Uh oh!

martin-cs commented Feb 15, 2023

Uh oh!

Uh oh!

remi-delmas-3000 commented Feb 9, 2023 •

edited

Loading

codecov bot commented Feb 9, 2023 •

edited

Loading