Skip to content

Add code_typet::get_parameter_indices #1688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions src/util/std_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -890,6 +890,26 @@ class code_typet:public typet
result.push_back(it->get_identifier());
return result;
}

typedef
std::unordered_map<irep_idt, std::size_t, irep_id_hash> parameter_indicest;

/// Get a map from parameter name to its index
parameter_indicest parameter_indices() const
{
parameter_indicest parameter_indices;
const parameterst &p = parameters();
parameter_indices.reserve(p.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this might not work as nicely as intended: hash values of dstringt (which irep_idt by default is) aren't uniformly distributed across 2^32 or 2^64. Given that the number of parameters is typically fairly small, chances are that all your elements end up in one bucket despite all the effort.

If you have some time to spare, it would be really interesting if you could do some debug runs where you log the distribution across buckets that you get to see in this bit of code (at the end of the function).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't run the test, but I note that the hash value of dstringt is just its number, and the std::hash specialisations shipped with libstdc++ for all primitive types of size <= sizeof(size_t) simply cast to size_t. The authors of the standard library will surely have anticipated std::unordered_map keyed by densely packed integers, so dstring seems likely to perform just as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific relevant excerpt from bits/functional_hash.h:

#define _Cxx_hashtable_define_trivial_hash(_Tp)         \
  template<>                                            \
    struct hash<_Tp> : public __hash_base<size_t, _Tp>  \
    {                                                   \
      size_t                                            \
      operator()(_Tp __val) const noexcept              \
      { return static_cast<size_t>(__val); }            \
    };

  ...

  _Cxx_hashtable_define_trivial_hash(long)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The authors of the standard library will surely have anticipated std::unordered_map keyed by densely packed integers, so dstring seems likely to perform just as well.

I'd love to believe this, had I not done the painful experiments behind the write-up at

// This is because Visual Studio's STL masks away anything

So maybe you are right, maybe you aren't. The only thing I'd truly trust is data across different platforms.

Note that all this isn't a big deal here for we are likely not dealing with big data sets and probably aren't on a hot path. Just saying this as data must prevail over instincts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like the opposite problem to me -- lots of the irep's hash is in high bits, which get ignored when picking a bucket, whereas for dense integers there's lots of useful information in low bits. I believe https://tessil.github.io/2016/08/29/benchmark-hopscotch-map.html for example is testing this exact case with unordered_map, though tests with libc++ and MSVC's STL implementation would be good to see too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be very clear: this is a purely academic discussion here, given the anticipated small data and rare use.

I do agree that you've got the opposite case, but that would just imply that you observe the opposite performance penalty between GNU STL vs. MSVC. The comparison you pointed to is a good read, but is of course on completely artificial data. It is important to contrast this with the profile of actual data to be observed at the point of use. In your specific case, I'd expect many uses might have just 2 or 3 buckets. The parameter identifiers will likely have been created in immediate succession. Thus, for the 2-parameter case we'd have an even and an odd number, and you'll be doing great with MSVC. But really you'll not even be able to observe a difference over using a map that does not use hashing, or using whatever the worst performing hash map for the given hash function.

My view is: keep things simple where possible, else optimise as far as possible and back that up by data.

std::size_t index = 0;
for(const auto &p : parameters())
{
const irep_idt &id = p.get_identifier();
if(!id.empty())
parameter_indices.insert({ id, index });
++index;
}
return parameter_indices;
}
};

/*! \brief Cast a generic typet to a \ref code_typet
Expand Down
1 change: 1 addition & 0 deletions unit/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ SRC += unit_tests.cpp \
util/expr_cast/expr_cast.cpp \
util/expr_iterator.cpp \
util/message.cpp \
util/parameter_indices.cpp \
util/simplify_expr.cpp \
util/symbol_table.cpp \
catch_example.cpp \
Expand Down
Binary file added unit/util/ParameterIndicesTest.class
Binary file not shown.
8 changes: 8 additions & 0 deletions unit/util/ParameterIndicesTest.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@

public class ParameterIndicesTest {

public void f(ParameterIndicesTest param1, int param2) {}

public static void g(float param1, ParameterIndicesTest param2) {}

}
33 changes: 33 additions & 0 deletions unit/util/parameter_indices.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
/*******************************************************************\

Module: Parameter indices test

Author: Diffblue Limited. All rights reserved.

\*******************************************************************/

#include <testing-utils/catch.hpp>
#include <testing-utils/load_java_class.h>
#include <util/std_types.h>

void check_consistency(const symbolt &symbol)
{
const auto &code_type = to_code_type(symbol.type);
auto parameter_ids = code_type.parameter_identifiers();
auto parameter_indices = code_type.parameter_indices();

REQUIRE(parameter_ids.size() == parameter_indices.size());
for(std::size_t i = 0; i < parameter_ids.size(); ++i)
REQUIRE(parameter_indices.at(parameter_ids.at(i)) == i);
}

TEST_CASE("Parmeter indices consistency", "[core][util][parameter_indices]")
{
symbol_tablet symbol_table = load_java_class("ParameterIndicesTest", "util/");
check_consistency(
symbol_table.lookup_ref(
"java::ParameterIndicesTest.f:(LParameterIndicesTest;I)V"));
check_consistency(
symbol_table.lookup_ref(
"java::ParameterIndicesTest.g:(FLParameterIndicesTest;)V"));
}