Skip to content

Add code_typet::get_parameter_indices #1688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Jan 2, 2018

This is simply the opposite of the existing get_parameter_indentifiers, useful for mapping from parameter identifier to its argument or parameter index.

Another simple utility that the sec product currently has but is generally handy.

@tautschnig
Copy link
Collaborator

I have restarted the failing Travis jobs.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code will work for Java (I think), but not necessarily for C. Thus any user (beyond the provided unit-test) needs to be revisited in addition to the proposed implementation.

parameter_indices.reserve(p.size());
std::size_t i = 0;
for(const auto &parameter : parameters())
parameter_indices.insert({ parameter.get_identifier(), i++ });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters need not have an identifier, in which case the results will at best cause confusion. Thus parameter_identifiers exists in goto_function_templatet.

@smowton smowton force-pushed the smowton/feature/parameter_indices branch from 796a4a9 to 1045722 Compare January 3, 2018 12:07
@smowton
Copy link
Contributor Author

smowton commented Jan 3, 2018

@tautschnig have amended this to check for empty identifiers as suggested.

@tautschnig
Copy link
Collaborator

I still have a design problem here: we have parameter_identifiers in goto_function_templatet, but now introduce a different way of seemingly obtaining the same. The one in goto_function_templatet will always work, while this new one will sometimes work. Can we please only have one of these?

@smowton
Copy link
Contributor Author

smowton commented Jan 3, 2018

That appears to clash with the function I'm not adding. code_typet::parameter_identifiers() already exists, and I suppose that goto_functiont::parameter_identifiers and goto_functiont::type.parameter_identifiers() should always produce the same information. However I note that in-tree, there do not appear to be any users of either parameter_identifiers version, and nobody seems to initialise the goto-function version.

@smowton
Copy link
Contributor Author

smowton commented Jan 3, 2018

Please disregard the weird "infinite wisdom" stuff, was being trolled by @owen-jones-diffblue (or possibly @forejtv?) while my screen was left unlocked ;)

@tautschnig tautschnig dismissed their stale review January 3, 2018 18:04

I confused two different mappings.

@tautschnig
Copy link
Collaborator

My apologies for the confusion! I have created #1695 to eventually do some cleanup.

const parameterst &p = parameters();
parameter_indices.reserve(p.size());
std::size_t i = 0;
for(const auto &parameter : parameters())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might use p here.

parameter_indicest parameter_indices;
const parameterst &p = parameters();
parameter_indices.reserve(p.size());
std::size_t i = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use index here. (Related to the second half of my suggestion below.)

for(const auto &parameter : parameters())
{
const irep_idt &id = parameter.get_identifier();
std::size_t index = i++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do ++index; at the end of the loop body.

{
parameter_indicest parameter_indices;
const parameterst &p = parameters();
parameter_indices.reserve(p.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this might not work as nicely as intended: hash values of dstringt (which irep_idt by default is) aren't uniformly distributed across 2^32 or 2^64. Given that the number of parameters is typically fairly small, chances are that all your elements end up in one bucket despite all the effort.

If you have some time to spare, it would be really interesting if you could do some debug runs where you log the distribution across buckets that you get to see in this bit of code (at the end of the function).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't run the test, but I note that the hash value of dstringt is just its number, and the std::hash specialisations shipped with libstdc++ for all primitive types of size <= sizeof(size_t) simply cast to size_t. The authors of the standard library will surely have anticipated std::unordered_map keyed by densely packed integers, so dstring seems likely to perform just as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific relevant excerpt from bits/functional_hash.h:

#define _Cxx_hashtable_define_trivial_hash(_Tp)         \
  template<>                                            \
    struct hash<_Tp> : public __hash_base<size_t, _Tp>  \
    {                                                   \
      size_t                                            \
      operator()(_Tp __val) const noexcept              \
      { return static_cast<size_t>(__val); }            \
    };

  ...

  _Cxx_hashtable_define_trivial_hash(long)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The authors of the standard library will surely have anticipated std::unordered_map keyed by densely packed integers, so dstring seems likely to perform just as well.

I'd love to believe this, had I not done the painful experiments behind the write-up at

// This is because Visual Studio's STL masks away anything

So maybe you are right, maybe you aren't. The only thing I'd truly trust is data across different platforms.

Note that all this isn't a big deal here for we are likely not dealing with big data sets and probably aren't on a hot path. Just saying this as data must prevail over instincts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like the opposite problem to me -- lots of the irep's hash is in high bits, which get ignored when picking a bucket, whereas for dense integers there's lots of useful information in low bits. I believe https://tessil.github.io/2016/08/29/benchmark-hopscotch-map.html for example is testing this exact case with unordered_map, though tests with libc++ and MSVC's STL implementation would be good to see too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be very clear: this is a purely academic discussion here, given the anticipated small data and rare use.

I do agree that you've got the opposite case, but that would just imply that you observe the opposite performance penalty between GNU STL vs. MSVC. The comparison you pointed to is a good read, but is of course on completely artificial data. It is important to contrast this with the profile of actual data to be observed at the point of use. In your specific case, I'd expect many uses might have just 2 or 3 buckets. The parameter identifiers will likely have been created in immediate succession. Thus, for the 2-parameter case we'd have an even and an odd number, and you'll be doing great with MSVC. But really you'll not even be able to observe a difference over using a map that does not use hashing, or using whatever the worst performing hash map for the given hash function.

My view is: keep things simple where possible, else optimise as far as possible and back that up by data.

This is simply the opposite of the existing get_parameter_indentifiers,
useful for mapping from parameter identifier to its argument or parameter
index.
@smowton smowton force-pushed the smowton/feature/parameter_indices branch from 1045722 to 3f1fd64 Compare January 4, 2018 11:37
@smowton
Copy link
Contributor Author

smowton commented Jan 4, 2018

@tautschnig hash function is fine I think; other changes implemented

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as the remaining discussion item isn't a blocker for this piece of code.

@smowton smowton merged commit fd2bf6a into diffblue:develop Jan 5, 2018
smowton added a commit to smowton/cbmc that referenced this pull request May 9, 2018
d190fd8 Merge remote-tracking branch 'upstream/develop' into pull-support-20180112
5bd5962 Merge pull request diffblue#1667 from romainbrenguier/feature/type_cast
8ecb55a Merge pull request diffblue#1717 from smowton/smowton/feature/remove_virtual_functions_per_function
0d7310a Merge pull request diffblue#1691 from romainbrenguier/bugfix/getClass#TG-1245
2064849 Use type_checked_cast in boolbv_width
c5fc351 Validate data in pointer_typet in to_pointer_typet
35905c6 Add can_cast_type, validate_type for pointer_typet
b72bacf Define type_try_dynamic_cast and type_checked_cast
23c3561 Unit test for string symbol resolution
c13e602 Adding only needed equations in symbol resolution
ae4deff Debug information for string equations
912828d JBMC: Run remove-virtual-functions as each function is converted
a711c64 Introduce mechanism for renumbering an individual GOTO program
e308a32 Merge pull request diffblue#1679 from NlightNFotis/nondet_extra_test
e6ceb91 Merge pull request diffblue#1724 from tautschnig/fix-visitor
ea74bed Add extra test for nondet-static flag and arrays
4f74896 Use irept API, not implementation-level API
c55b4a5 Merge pull request diffblue#1682 from martin-cs/fix/dependence-graph-namespace-lifespan
1a2c14b Merge pull request diffblue#1722 from diffblue/unsafe_type_constructors
957a568 Merge pull request diffblue#1677 from NlightNFotis/pb4_develop
9c5add4 remove deprecated constructors for three bitvector types
c96e02a no longer use deprecated constructors for some bitvector types
954060e Add unit test for has_subtype
3dd3877 Refactor has_char_pointer_subtype with has_subtype
4699c13 Extend symbol resolution to string_typet
74144fc Handle if_exprt in add_axioms_for_string_literal
c6c1b3f Add an optional guard to add_axioms_for_constant
933d635 Merge pull request diffblue#1716 from mgudemann/fix/null_check_for_java_instanceof
1659314 Merge pull request diffblue#1715 from smowton/smowton/cleanup/jbmc_unused_passes
9c457b7 Add regression test for null instanceof.
2080cd3 Complete instanceof for Java.
d4300d0 Merge pull request diffblue#1697 from diffblue/nondet_symbol_exprt
1c68dd4 Merge pull request diffblue#1714 from tautschnig/c-library-strcat
44b5bae Merge pull request diffblue#1698 from thomasspriggs/tg1633
c4304ba JBMC: Remove C-only passes
bb8cfaa C library: Check upper bounds in memset, memcpy, memmove
7d4984f C library: Implement strcat, strncat
2a5cea2 This introduces nondet_symbol_exprt, which is generated by symbolic execution in response to side_effect_expr_nondett
85193a0 Merge pull request diffblue#1694 from NathanJPhillips/feature/add-raw-lhs-to-trace
d9122dc Merge pull request diffblue#1710 from NathanJPhillips/feature/remove_instanceof_per_function
092df69 Switch from custom file / path routines to Boost-filesystem
c8821b2 Allow to remove instanceof when remove exceptions
94b7658 Don't pass iterators into function calls
a9c4e4f Added regression tests
76318ce Protect extended trace behind a command line option
69b0ff1 Added base_name in comments for all symbols
e86080a Add raw LHS irep field to trace output
ddd1b7a Add remove_instanceof overload to remove from a particular instruction
1c227b7 Merge pull request diffblue#1660 from smowton/smowton/fix/lazy_methods_array_parameters
ae89c94 Lazy loading: assume concrete generic parameter types are needed
80eb6a6 TG-1877: Include array pointer types in needed classes
1053e5f Fix for [TG-1633] Inner generic types have incorrect type identifier
e2cda1a Merge pull request diffblue#1704 from tautschnig/fix-copy-paste
ef4a65e Fix op1/op0 copy&paste typo
21ea31f Merge pull request diffblue#1702 from peterschrammel/goto-diff-java
c4bc953 Merge pull request diffblue#1701 from peterschrammel/allow-instrument-jdk
2811363 Java regression test for goto-diff
43d2e09 Also reset fresh temporary symbol counter
9ef28f4 Compare relative goto target offsets
eaf3a7d Get source location from symbol table
ab59659 Allow instrumentation of java.* and org.cprover.*
6fbd59c Merge pull request diffblue#1631 from tautschnig/fix-pointer-minus
7c04b5c Merge pull request diffblue#1699 from NathanJPhillips/feature/reset-main-in-tests
5e0f186 Pointer difference over void* is difference over char*
faf8f00 Merge commit 'a83b52cddbed22304372c276512c63701eb3aedb' into pull-support-20180104
8236db4 Merge pull request diffblue#1419 from peterschrammel/refactor/cover-instrument
a580e27 Merge pull request diffblue#1689 from smowton/smowton/feature/get_this
591511a Allow callers of load_java_class to pass the name of the main function
1b86b27 Merge pull request diffblue#1687 from smowton/smowton/feature/class-hierarchy-dot
fd2bf6a Merge pull request diffblue#1688 from smowton/smowton/feature/parameter_indices
f570ce5 Merge pull request diffblue#1696 from smowton/smowton/fix/identical_struct_equality
61b0d6d Merge pull request diffblue#1666 from mgudemann/bugfix/removed_required_virtual_calls
3365054 Add regression test
2b6dc8b Resolve concrete function call if no implementation is found
3f1fd64 Add code_typet::get_parameter_indices
42cf61a Fix testing for empty line in test desc file
2090000 Fix missing newline at end of desc file
e448d5f Fix unsatisfiable test line
f7f033d String smoke tests: ensure no type mismatches are seen
b627c3d Replace unsound struct-cast simplification
8fa42b3 Class hierarchy: add DOT output, unit tests
04f2faf Mark GOTO instructions with unresolved virtual calls
aac181f Pass command line options via optionst
b6fa3e8 Factorize source location initialization
8da5395 Document cover functions
a7f0c3d Introduce cover instrumenter
873627a Split cover into several files
0fc08f3 Replace cover-function-only by cover-include-pattern
1f2102c Add code_typet::get_this
2801f0f Avoid crashing when --dependence-graph is used by correcting namespace scoping.
acac776 Add a test for the same-named static functions crashing dependence graph in the goto-analyser
05f46a9 Fix the problem where two static functions with the same name would cause the dependency graph to fail.

git-subtree-dir: cbmc
git-subtree-split: d190fd8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants