Skip to content

Restore named_sub map entries ordering when reading from a goto binary. #7534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

remi-delmas-3000
Copy link
Collaborator

@remi-delmas-3000 remi-delmas-3000 commented Feb 9, 2023

The named_subt is a map from irep_idt keys to irept values implemented using a forward list. Entries must be stored in increasing key order in the list, otherwise lookups will fail. The irep_idt ordering is defined by how string interning numbers strings in CBMC.

When a goto binary is produced by a different tool the named_sub ordering in the binary file may not necessarily match the one expected by CBMC. We use irept::named_subt::add instead of irept::named_subt::emplace_after to build the named_sub when deserialising so that so that proper ordering is guaranteed.

If the performance impact is considered acceptable, this PR would make the binary format independent of the internal numbering of strings used in CBMC, which in turn will allow us to produce goto binaries directly from Kani instead of using the JSON + symtab2gb path.

I measured the performance impact on read_bin_goto_object on a collection of goto binaries in ranging in size from a hundreds of kb to 8Mb. Overall reordering adds a measurable 2-5% overhead on read_bin_goto_object.

For kb-sized goto binaries the overhead is ~1 millisecond

- baseline Runtime read_bin_goto_object: 0.025s
- reordering Runtime read_bin_goto_object: 0.026s
- baseline Runtime read_bin_goto_object: 0.031s
- reordering Runtime read_bin_goto_object: 0.032s

For the 8Mb binary the baseline is:

ubuntu@ip-172-31-23-10:~/tmp$ time goto-cc a.out -o b.out
Runtime read_bin_goto_object: 0.316...s

real    0m1.427s
user    0m1.337s
sys     0m0.077s

with reordering we see an extra ~8ms spent on loading the binary.

ubuntu@ip-172-31-23-10:~/tmp$ time goto-cc a.out -o b.out
Runtime read_bin_goto_object: 0.322...s

real    0m1.433s
user    0m1.332s
sys     0m0.091s
  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@remi-delmas-3000 remi-delmas-3000 self-assigned this Feb 9, 2023
@remi-delmas-3000 remi-delmas-3000 added aws Bugs or features of importance to AWS CBMC users aws-high labels Feb 9, 2023
@codecov
Copy link

codecov bot commented Feb 9, 2023

Codecov Report

Base: 78.49% // Head: 78.49% // Decreases project coverage by -0.01% ⚠️

Coverage data is based on head (c52c0f6) compared to base (fbd2b88).
Patch coverage: 90.00% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7534      +/-   ##
===========================================
- Coverage    78.49%   78.49%   -0.01%     
===========================================
  Files         1663     1667       +4     
  Lines       191336   191437     +101     
===========================================
+ Hits        150182   150261      +79     
- Misses       41154    41176      +22     
Impacted Files Coverage Δ
...-sensitivity/value_set_pointer_abstract_object.cpp 92.18% <ø> (ø)
src/cprover/generalization.cpp 84.21% <ø> (ø)
src/cprover/inductiveness.cpp 78.35% <ø> (ø)
src/cprover/state_encoding.cpp 80.57% <ø> (ø)
...oto-instrument/contracts/instrument_spec_assigns.h 94.73% <ø> (ø)
src/goto-instrument/unwindset.cpp 85.43% <ø> (ø)
src/solvers/flattening/boolbv_bitreverse.cpp 100.00% <ø> (ø)
src/solvers/flattening/boolbv_get.cpp 83.25% <ø> (ø)
src/solvers/smt2_incremental/ast/smt_commands.cpp 88.03% <ø> (ø)
...c/solvers/smt2_incremental/convert_expr_to_smt.cpp 89.66% <ø> (ø)
... and 33 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@tautschnig
Copy link
Collaborator

The performance impact seems negligible.

We need to collect some numbers, perhaps on the GOTO binaries produced by Kani for those tend to be pretty large. I'd suggest to measure the time that goto-cc binary -o binary.2 takes before and after this patch.

@kroening
Copy link
Member

We do not wish to guarantee an ordering of the named_sub_map entries.

@thomasspriggs
Copy link
Contributor

We do not wish to guarantee an ordering of the named_sub_map entries.

The current implementation of forward_list_as_mapt::find requires the underlying list to be ordered. This results in a hidden requirement for the serialised form to be ordered. The requirement stems from the lookup being implemented using std::lower_bound, which has ordering/partitioning requirements.

The alternatives to sorting on file read, I can think of would include -

  1. Generate an error on reading an unordered file.
  2. Update forward_list_as_map so that the lookup functions for an unordered underlying collection.

Option 1 is just moving the requirement for the collection to be ordered around. Option 2 potentially moves the performance costs around and doesn't deal with any further hidden requirements on order in code using irept::get_named_sub().

@remi-delmas-3000 remi-delmas-3000 added the Kani Bugs or features of importance to Kani Rust Verifier label Feb 10, 2023
@remi-delmas-3000 remi-delmas-3000 force-pushed the read-irep-ensure-named-sub-ordering branch from d08274e to b905c4f Compare February 10, 2023 16:31
@tautschnig
Copy link
Collaborator

Could you please add an emplace method to forward_list_as_mapt so that we don't even need to have this #if NAMED_SUB_IS_FORWARD_LIST in there anymore? This emplace should be the same as the existing add, except both parameters should be rvalue references. This would then fully hide the need for sortedness, and make the ordering on named_subt an implementation detail.

@thomasspriggs
Copy link
Contributor

Which ever solution we come to for this issue, it would be good to have a unit test on reading unordered named_subs. That would mitigate the risk of any future optimisation efforts reintroducing the same issue.

@remi-delmas-3000
Copy link
Collaborator Author

Which ever solution we come to for this issue, it would be good to have a unit test on reading unordered named_subs. That would mitigate the risk of any future optimisation efforts reintroducing the same issue.

I can provide a goto binary that does not satisfy the implicit ordering requirement. Would that be ok ?

@remi-delmas-3000
Copy link
Collaborator Author

Could you please add an emplace method to forward_list_as_mapt so that we don't even need to have this #if NAMED_SUB_IS_FORWARD_LIST in there anymore? This emplace should be the same as the existing add, except both parameters should be rvalue references. This would then fully hide the need for sortedness, and make the ordering on named_subt an implementation detail.

This is indeed better

@tautschnig
Copy link
Collaborator

Which ever solution we come to for this issue, it would be good to have a unit test on reading unordered named_subs. That would mitigate the risk of any future optimisation efforts reintroducing the same issue.

I can provide a goto binary that does not satisfy the implicit ordering requirement. Would that be ok ?

If so we’d need a way to build such a binary: as GOTO binary versions evolve we’ll still need to be able to run this test.

@thomasspriggs
Copy link
Contributor

If so we’d need a way to build such a binary: as GOTO binary versions evolve we’ll still need to be able to run this test.

A unit test could potentially be written to cover irep_serializationt only, by using arbitrary ireps rather than a full GOTO binary. It can also be done in-memory rather than using a pre-prepared file, by using std::stringstream instead of actual files.

@remi-delmas-3000
Copy link
Collaborator Author

Update: I implemented @tautschnig suggestion by adding an forward_list_as_mapt::emplace method to that does an order-preserving insertion, getting rid of the #ifdef conditions that distinguished between std::map and forward_list_as_map where possible.

I also renamed forward_list_as_mapt::remove to forward_list_as_mapt::erase in order to match std::map::remove and got rid of more #ifdefs.

Last I changed to using forward_list_as_mapt::size where possible to clean up some #ifdefs.

The only place that still assumes an ordering for the named_sub map is in irep_merge.cpp, but since this is an internal function (as opposed to a function that loads irep coming from outside of CBMC), I think we can keep it as is.

The named_sub ordering assumption is essentially removed for GOTO binaries.

Copy link
Contributor

@thomasspriggs thomasspriggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving as I believe the main files affected by this PR, for which I am code owner are the SMT backend and that adding in the missing include of <algorithm> is a perfectly reasonable change to make for these.

I would prefer to see a test of some kind included before this PR is merged. I have the feeling that better performance characteristics for loading may be achieved using an emplace_hint style interface. But without the test we can't be certain that potential optimisations do not break your fix.

@peterschrammel peterschrammel removed their assignment Feb 15, 2023
@tautschnig tautschnig merged commit 5678ed6 into diffblue:develop Feb 15, 2023
Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A belated approve of adding the headers. I'm not quite sure I understand why they are needed but they are not offensive.

@thomasspriggs
Copy link
Contributor

A belated approve of adding the headers. I'm not quite sure I understand why they are needed but they are not offensive.

forward_list_as_map.h has #include <algorithm> in it. This adds a transitive include of algorithm to most of the code base via irep.h. However if any file relies on this transitive include rather than including algorithm where it is used, then the build breaks when building using std::map as the implementation of named_subt. This is because this build option does not include forward_list_as_map.h and so in turn does not have the transitive include of algorithm.

@martin-cs
Copy link
Collaborator

Thanks of the explanation @thomasspriggs .

@remi-delmas-3000 remi-delmas-3000 deleted the read-irep-ensure-named-sub-ordering branch February 15, 2023 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users aws-high Kani Bugs or features of importance to Kani Rust Verifier
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants