Skip to content

Force operator< to be inlined to reduce place time #1265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 13, 2020

Conversation

xuqinziyue
Copy link
Contributor

Description

The change forces operator< to be inlined properly.

Related Issue

#1225

Motivation and Context

Original operator< is not inlined by compiler. Since the function is timing critical, force the function to be inlined can reduce place time by ~10% with xc7 tests and ~5% with QoR and Titan benchmark tests

How Has This Been Tested?

Run through xc7 tests like picosoc and murax and passed QoR and Titan Benchmarks.
Comparison shows a place time reduction.
Comparison also proves there is no impact on quality of results.

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@probot-autolabeler probot-autolabeler bot added lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool labels Apr 5, 2020
Copy link
Contributor

@kmurray kmurray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together @xuqinziyue!

One additional thought. Have you tried the std::tie based comparison operator in combination with always_inline? If that produces similar performance, I think it would be preferable since the std::tie approach would be less brittle to future code changes (since it makes fewer assumptions).

If that isn't the case I've described how we can have the compiler check that the assumptions the proposed method depends are valid. I've also provided more detailed comments below.

Comment on lines +9 to +22
#ifndef __has_attribute
# define __has_attribute(x) 0 // Compatibility with non-clang compilers.
#endif

#if defined(COMPILER_GCC) && defined(NDEBUG)
# define ALWAYS_INLINE inline __attribute__((__always_inline__))
#elif defined(COMPILER_MSVC) && defined(NDEBUG)
# define ALWAYS_INLINE __forceinline
#elif __has_attribute(always_inline)
# define ALWAYS_INLINE __attribute__((always_inline)) // clang
#else
# define ALWAYS_INLINE inline
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be better to use C++ attributes (available since C++11) as a cleaner way to specify this.

For example:

[[gnu::always_inline]] friend bool operator<(const t_override& lhs, const t_override& rhs) {

That would avoid having to define the ALWAYS_INLINE macro in a cleaner cross-compiler way.

@mithro Any thoughts on this?

friend ALWAYS_INLINE bool operator<(const t_override& lhs, const t_override& rhs) {
const short* left = reinterpret_cast<const short*>(&lhs);
const short* right = reinterpret_cast<const short*>(&rhs);
return std::lexicographical_compare(left, left + 6, right, right + 6);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to explicitly state what assumptions this code is making. In particular, I think it is assuming:

  • All data members of t_override are shorts
  • There is no padding in t_override
  • Based on the above, treating t_override's memory layout as an array of shorts is the same as a member-wise comparison

We should also encode these assumptions into some compile-time checks, otherwise we are setting ourselves up for some very challenging to find bugs in the future (when t_override changes).

I'd suggest something like:

static_assert(sizeof(t_override) = sizeof(t_override::from_type)
                                   + sizeof(t_override::to_type)
                                    + sizeof(t_override::from_class)
                                    + sizeof(t_override::to_class)
                                    + sizeof(t_override::delta_x)
                                    + sizeof(t_override::delta_y),
              "Expect t_override to have a memory layout equivalent to an array of short (no padding)");
static_assert(sizeof(t_override::from_type) == sizeof(short), "Expect all t_override data members to be shorts");
static_assert(sizeof(t_override::to_type) == sizeof(short), "Expect all t_override data members to be shorts");
static_assert(sizeof(t_override::from_class) == sizeof(short), "Expect all t_override data members to be shorts");
static_assert(sizeof(t_override::to_class) == sizeof(short), "Expect all t_override data members to be shorts");
static_assert(sizeof(t_override::delta_x) == sizeof(short), "Expect all t_override data members to be shorts");
static_assert(sizeof(t_override::delta_y) == sizeof(short), "Expect all t_override data members to be shorts");

which would verify there is no padding in the t_override memory layout and that each data member is a short. This would catch potential bugs caused by changing the number of data members in t_override, or the data type of members in t_override.

We could then also modify the comparison to not be explicitly hard coded for 6 data members:

constexpr size_t NUM_T_OVERRIDE_MEMBERS = sizeof(t_override) / sizeof(short);
return std::lexicographical_compare(left, left + NUM_T_OVERRIDE_MEMBERS, right, right + NUM_T_OVERRIDE_MEMBERS);

@xuqinziyue
Copy link
Contributor Author

Thanks for putting this together @xuqinziyue!

One additional thought. Have you tried the std::tie based comparison operator in combination with always_inline? If that produces similar performance, I think it would be preferable since the std::tie approach would be less brittle to future code changes (since it makes fewer assumptions).

If that isn't the case I've described how we can have the compiler check that the assumptions the proposed method depends are valid. I've also provided more detailed comments below.

I have tried the combination of std::tie and always_inline but based on my experiment, operator < is not inlined in the case. Thanks for the comments below, and I will update the pull request with the revisions you suggest shortly.

@kmurray
Copy link
Contributor

kmurray commented Apr 9, 2020

@xuqinziyue The updates look good. Can you fix the code formatting (make format) and update so it passes the CI tests? Once that is done it should be good to merge.

@xuqinziyue
Copy link
Contributor Author

@kmurray I have fixed the formatting issue and have done a rebase. Let me know if there is anything else I should change

@kmurray
Copy link
Contributor

kmurray commented Apr 13, 2020

Thanks!

@kmurray kmurray merged commit cb1dcb8 into verilog-to-routing:master Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants