Force operator< to be inlined to reduce place time #1265

xuqinziyue · 2020-04-05T15:10:06Z

Description

The change forces operator< to be inlined properly.

Related Issue

Motivation and Context

Original operator< is not inlined by compiler. Since the function is timing critical, force the function to be inlined can reduce place time by ~10% with xc7 tests and ~5% with QoR and Titan benchmark tests

How Has This Been Tested?

Run through xc7 tests like picosoc and murax and passed QoR and Titan Benchmarks.
Comparison shows a place time reduction.
Comparison also proves there is no impact on quality of results.

Types of changes

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation
I have updated the documentation accordingly
I have added tests to cover my changes
All new and existing tests passed

kmurray

Thanks for putting this together @xuqinziyue!

One additional thought. Have you tried the std::tie based comparison operator in combination with always_inline? If that produces similar performance, I think it would be preferable since the std::tie approach would be less brittle to future code changes (since it makes fewer assumptions).

If that isn't the case I've described how we can have the compiler check that the assumptions the proposed method depends are valid. I've also provided more detailed comments below.

kmurray · 2020-04-06T13:21:39Z

vpr/src/place/place_delay_model.h

+#ifndef __has_attribute
+#    define __has_attribute(x) 0 // Compatibility with non-clang compilers.
+#endif
+
+#if defined(COMPILER_GCC) && defined(NDEBUG)
+#    define ALWAYS_INLINE inline __attribute__((__always_inline__))
+#elif defined(COMPILER_MSVC) && defined(NDEBUG)
+#    define ALWAYS_INLINE __forceinline
+#elif __has_attribute(always_inline)
+#    define ALWAYS_INLINE __attribute__((always_inline)) // clang
+#else
+#    define ALWAYS_INLINE inline
+#endif
+


I wonder if it would be better to use C++ attributes (available since C++11) as a cleaner way to specify this.

For example:

[[gnu::always_inline]] friend bool operator<(const t_override& lhs, const t_override& rhs) {

That would avoid having to define the ALWAYS_INLINE macro in a cleaner cross-compiler way.

@mithro Any thoughts on this?

kmurray · 2020-04-06T13:50:57Z

vpr/src/place/place_delay_model.h

+        friend ALWAYS_INLINE bool operator<(const t_override& lhs, const t_override& rhs) {
+            const short* left = reinterpret_cast<const short*>(&lhs);
+            const short* right = reinterpret_cast<const short*>(&rhs);
+            return std::lexicographical_compare(left, left + 6, right, right + 6);


I think we also need to explicitly state what assumptions this code is making. In particular, I think it is assuming:

All data members of t_override are shorts

There is no padding in t_override

Based on the above, treating t_override's memory layout as an array of shorts is the same as a member-wise comparison

We should also encode these assumptions into some compile-time checks, otherwise we are setting ourselves up for some very challenging to find bugs in the future (when t_override changes).

I'd suggest something like:

static_assert(sizeof(t_override) = sizeof(t_override::from_type) + sizeof(t_override::to_type) + sizeof(t_override::from_class) + sizeof(t_override::to_class) + sizeof(t_override::delta_x) + sizeof(t_override::delta_y), "Expect t_override to have a memory layout equivalent to an array of short (no padding)"); static_assert(sizeof(t_override::from_type) == sizeof(short), "Expect all t_override data members to be shorts"); static_assert(sizeof(t_override::to_type) == sizeof(short), "Expect all t_override data members to be shorts"); static_assert(sizeof(t_override::from_class) == sizeof(short), "Expect all t_override data members to be shorts"); static_assert(sizeof(t_override::to_class) == sizeof(short), "Expect all t_override data members to be shorts"); static_assert(sizeof(t_override::delta_x) == sizeof(short), "Expect all t_override data members to be shorts"); static_assert(sizeof(t_override::delta_y) == sizeof(short), "Expect all t_override data members to be shorts");

which would verify there is no padding in the t_override memory layout and that each data member is a short. This would catch potential bugs caused by changing the number of data members in t_override, or the data type of members in t_override.

We could then also modify the comparison to not be explicitly hard coded for 6 data members:

constexpr size_t NUM_T_OVERRIDE_MEMBERS = sizeof(t_override) / sizeof(short); return std::lexicographical_compare(left, left + NUM_T_OVERRIDE_MEMBERS, right, right + NUM_T_OVERRIDE_MEMBERS);

xuqinziyue · 2020-04-06T16:10:33Z

Thanks for putting this together @xuqinziyue!

One additional thought. Have you tried the std::tie based comparison operator in combination with always_inline? If that produces similar performance, I think it would be preferable since the std::tie approach would be less brittle to future code changes (since it makes fewer assumptions).

If that isn't the case I've described how we can have the compiler check that the assumptions the proposed method depends are valid. I've also provided more detailed comments below.

I have tried the combination of std::tie and always_inline but based on my experiment, operator < is not inlined in the case. Thanks for the comments below, and I will update the pull request with the revisions you suggest shortly.

kmurray · 2020-04-09T19:50:10Z

@xuqinziyue The updates look good. Can you fix the code formatting (make format) and update so it passes the CI tests? Once that is done it should be good to merge.

xuqinziyue · 2020-04-10T21:32:46Z

@kmurray I have fixed the formatting issue and have done a rebase. Let me know if there is anything else I should change

kmurray · 2020-04-13T13:36:02Z

Thanks!

probot-autolabeler bot added lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool labels Apr 5, 2020

mithro added the kokoro:force-run label Apr 6, 2020

symbiflow-robot removed the kokoro:force-run label Apr 6, 2020

kmurray reviewed Apr 6, 2020

View reviewed changes

Force operator< to be inlined to reduce place time

5333188

xuqinziyue force-pushed the master branch from 2db9694 to 5333188 Compare April 10, 2020 21:30

kmurray merged commit cb1dcb8 into verilog-to-routing:master Apr 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force operator< to be inlined to reduce place time #1265

Force operator< to be inlined to reduce place time #1265

xuqinziyue commented Apr 5, 2020

kmurray left a comment

kmurray Apr 6, 2020

kmurray Apr 6, 2020

xuqinziyue commented Apr 6, 2020

kmurray commented Apr 9, 2020

xuqinziyue commented Apr 10, 2020

kmurray commented Apr 13, 2020

Force operator< to be inlined to reduce place time #1265

Force operator< to be inlined to reduce place time #1265

Conversation

xuqinziyue commented Apr 5, 2020

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

kmurray left a comment

Choose a reason for hiding this comment

kmurray Apr 6, 2020

Choose a reason for hiding this comment

kmurray Apr 6, 2020

Choose a reason for hiding this comment

xuqinziyue commented Apr 6, 2020

kmurray commented Apr 9, 2020

xuqinziyue commented Apr 10, 2020

kmurray commented Apr 13, 2020