-
Notifications
You must be signed in to change notification settings - Fork 277
Restore named_sub map entries ordering when reading from a goto binary. #7534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore named_sub map entries ordering when reading from a goto binary. #7534
Conversation
Codecov ReportBase: 78.49% // Head: 78.49% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## develop #7534 +/- ##
===========================================
- Coverage 78.49% 78.49% -0.01%
===========================================
Files 1663 1667 +4
Lines 191336 191437 +101
===========================================
+ Hits 150182 150261 +79
- Misses 41154 41176 +22
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
We need to collect some numbers, perhaps on the GOTO binaries produced by Kani for those tend to be pretty large. I'd suggest to measure the time that |
We do not wish to guarantee an ordering of the named_sub_map entries. |
The current implementation of The alternatives to sorting on file read, I can think of would include -
Option 1 is just moving the requirement for the collection to be ordered around. Option 2 potentially moves the performance costs around and doesn't deal with any further hidden requirements on order in code using |
d08274e
to
b905c4f
Compare
Could you please add an |
Which ever solution we come to for this issue, it would be good to have a unit test on reading unordered named_subs. That would mitigate the risk of any future optimisation efforts reintroducing the same issue. |
I can provide a goto binary that does not satisfy the implicit ordering requirement. Would that be ok ? |
This is indeed better |
If so we’d need a way to build such a binary: as GOTO binary versions evolve we’ll still need to be able to run this test. |
A unit test could potentially be written to cover |
b905c4f
to
c52c0f6
Compare
Update: I implemented @tautschnig suggestion by adding an I also renamed Last I changed to using The only place that still assumes an ordering for the The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am approving as I believe the main files affected by this PR, for which I am code owner are the SMT backend and that adding in the missing include of <algorithm>
is a perfectly reasonable change to make for these.
I would prefer to see a test of some kind included before this PR is merged. I have the feeling that better performance characteristics for loading may be achieved using an emplace_hint
style interface. But without the test we can't be certain that potential optimisations do not break your fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A belated approve of adding the headers. I'm not quite sure I understand why they are needed but they are not offensive.
|
Thanks of the explanation @thomasspriggs . |
The
named_subt
is a map fromirep_idt
keys toirept
values implemented using a forward list. Entries must be stored in increasing key order in the list, otherwise lookups will fail. Theirep_idt
ordering is defined by how string interning numbers strings in CBMC.When a goto binary is produced by a different tool the
named_sub
ordering in the binary file may not necessarily match the one expected by CBMC. We useirept::named_subt::add
instead ofirept::named_subt::emplace_after
to build thenamed_sub
when deserialising so that so that proper ordering is guaranteed.If the performance impact is considered acceptable, this PR would make the binary format independent of the internal numbering of strings used in CBMC, which in turn will allow us to produce goto binaries directly from Kani instead of using the JSON + symtab2gb path.
I measured the performance impact on
read_bin_goto_object
on a collection of goto binaries in ranging in size from a hundreds of kb to 8Mb. Overall reordering adds a measurable 2-5% overhead onread_bin_goto_object
.For kb-sized goto binaries the overhead is ~1 millisecond
For the 8Mb binary the baseline is:
with reordering we see an extra ~8ms spent on loading the binary.