Skip to content

Parallel router: baseline #2337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 1, 2023
Merged

Parallel router: baseline #2337

merged 1 commit into from
Aug 1, 2023

Conversation

duck2
Copy link
Contributor

@duck2 duck2 commented Jun 12, 2023

Description

Add a spatial partitioning-based parallel router. This version is very simple, it builds a PartitionTree which recursively partitions the netlist into groups (PartitionTreeNodes) of non-overlapping nets, then uses a thread pool to route the groups.

Related Issue

Motivation and Context

More CPUs, runtime go down

How Has This Been Tested?

Ran vtr_reg_strong and titan_quick_qor with --router_algorithm parallel.

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@duck2
Copy link
Contributor Author

duck2 commented Jun 22, 2023

Now this PR contains a number of cleanups and API changes:

  • RRNodeId-ified many files
  • Removed dead code left from breadth first router
  • Moved common route_timing.cpp functions into the header (they used to be static)
  • ConnectionRouter fns now take a can_grow_bb parameter and return an additional value retry_with_full_bb
    • Reason: Parallel router threads cannot simply grow the bounding box and retry routing (data race), they need a flag bubbled up from the connection router to try doing that on the next iteration
  • timing_driven_route_net and callees return NetResultFlags instead of bool

Should format, squash and rebase. I'll run vtr_reg_strong with --router_algorithm parallel to check for bugs.

Oops: there's no TBB in the runners

@duck2 duck2 force-pushed the par_router branch 2 times, most recently from 574bcbf to 87b2b50 Compare June 22, 2023 05:19
@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool libvtrutil external_libs infra Project Infrastructure labels Jun 22, 2023
@duck2
Copy link
Contributor Author

duck2 commented Jun 22, 2023

I think the tatum nondeterminism still exists with VPR_USE_TBB=1 and TATUM_USE_TBB=0.

@duck2
Copy link
Contributor Author

duck2 commented Jun 24, 2023

I think the tatum nondeterminism was not due to a data race (spent a lot of time on it with Intel Inspector) but the placer behavior depending on the ordering of modified_pins.

Parallel execution in SetupSlackCrit::update_slacks_and_criticalities gives a different ordering each time. Now the modified pins are in a std::set which is a sorted container. This seems to fix it at the first glance? Let's see if it also works with TATUM_USE_TBB=1

@duck2
Copy link
Contributor Author

duck2 commented Jun 25, 2023

a lot of failures due to memory and runtime increase?

@duck2
Copy link
Contributor Author

duck2 commented Jul 7, 2023

Turns out NetPinTimingInvalidator doesn't do anything when incremental STA is not enabled, but it does keep a cache to avoid doing nothing several times. It's a lot of trouble to keep that cache in a multithreaded and generic way and keep the performance. Added a NoopInvalidator for the general case of non-incremental timing analysis, this should help with the place time failures?

@duck2
Copy link
Contributor Author

duck2 commented Jul 8, 2023

I would add script_params_add=--router_algorithm parallel to some tests in vtr_reg_strong, but this PR already touched too many files. Better to update golden results for the problematic cases (place_time 0.88 in titan_quick_qor but fails for some FIR circuits in nightly1_odin?), upload QoR results and then ask for reviews.

@duck2 duck2 marked this pull request as ready for review July 10, 2023 16:00
@duck2 duck2 requested review from amin1377 and vaughnbetz July 10, 2023 16:00
@duck2 duck2 changed the title [WIP] Parallel router: baseline Parallel router: baseline Jul 10, 2023
@duck2
Copy link
Contributor Author

duck2 commented Jul 10, 2023

The strong_odin failure doesn't make a lot of sense: the input cmdline has --route_chan_width 26 but the test fails because min_chan_width was 18 in the golden results?

@vaughnbetz
Copy link
Contributor

Maybe it's a two phase test where it runs the router in find minimum channel width mode, and then re-runs at a channel width of 26?

@duck2
Copy link
Contributor Author

duck2 commented Jul 10, 2023

Indeed it was a bug I accidentally introduced while rebasing.

@duck2
Copy link
Contributor Author

duck2 commented Jul 11, 2023

I can't reproduce the place time failures on wintermute. Probably better to disable tbb for CI and figure out another way to integrate tbb tests.

@duck2 duck2 force-pushed the par_router branch 2 times, most recently from ca0d135 to d151a28 Compare July 11, 2023 13:53
@github-actions github-actions bot removed the infra Project Infrastructure label Jul 11, 2023
@duck2 duck2 force-pushed the par_router branch 2 times, most recently from f662632 to ae98395 Compare July 11, 2023 17:31
* may be called concurrently */
#ifdef TATUM_USE_TBB
tbb::concurrent_vector<EdgeId> invalidated_edges_;
tatum::util::linear_map<EdgeId, size_t> edge_invalidated_;
Copy link
Contributor Author

@duck2 duck2 Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a char (to save memory) (explain in the comments why that's thread safe)

Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Biggest feedback: comment functions (in header, or at static declaration) as much as you can.
I agree making a unit test for the partition tree is good, but that can wait for another PR.

@vaughnbetz
Copy link
Contributor

Please summarize and link to your QoR data. QoR results master vs. this branch with parallel off, and again with parallel on.

@duck2
Copy link
Contributor Author

duck2 commented Jul 31, 2023

Looks like there are a bunch of build failures in the test CI too

@github-actions github-actions bot added the ABC ABC Logic Optimization & Technology Mapping Tool label Jul 31, 2023
@duck2
Copy link
Contributor Author

duck2 commented Jul 31, 2023

This commit has a bunch of cleanups piggybacked onto the
baseline parallel router:

  * Use RRNodeId where possible
  * Remove dead code and docs left from breadth-first router
  * Fix nondeterminism in parallel tatum
  * Fix performance penalty from NetPinTimingInvalidator
@vaughnbetz
Copy link
Contributor

Seems like this is working (CI already fixed by you @duck2)?

@duck2
Copy link
Contributor Author

duck2 commented Aug 1, 2023

Yes, the ducktape seems to make it work.

Yosys didn't get built in Test / R: Basic with NO_GRAPHICS (push) but it worked in Test / R: Basic with NO_GRAPHICS (pull_request)?? retrying that one (edit: worked after a retry)

@duck2 duck2 merged commit f2783e2 into master Aug 1, 2023
@duck2 duck2 deleted the par_router branch August 1, 2023 02:35
@vaughnbetz
Copy link
Contributor

Thanks @duck2 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ABC ABC Logic Optimization & Technology Mapping Tool external_libs libvtrutil VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants