Skip to content

[Parallel Router] Random strong test failures #3029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
AmirhosseinPoolad opened this issue May 9, 2025 · 6 comments
Open

[Parallel Router] Random strong test failures #3029

AmirhosseinPoolad opened this issue May 9, 2025 · 6 comments
Assignees

Comments

@AmirhosseinPoolad
Copy link
Contributor

AmirhosseinPoolad commented May 9, 2025

Strong tests for parallel routing can randomly fail. Here's the failed CI run for a PR that did not touch VTR's code in any way and only changed an unrelated workflow file:
https://github.com/verilog-to-routing/vtr-verilog-to-routing/actions/runs/14929980706/job/41943464845#step:8:3197

There was a run failure and some QoR failures in some of the parallel router tests. When I ran the strong tests for the branch locally it successfully worked without any run or QoR failures, and VTR master also doesn't have any strong test failures.

@AmirhosseinPoolad
Copy link
Contributor Author

@AlexandreSinger @ueqri Any ideas why this might have happened? Thank a lot!

@AmirhosseinPoolad
Copy link
Contributor Author

AmirhosseinPoolad commented May 9, 2025

Update: Just re-ran the test for the PR and it was successful.
https://github.com/verilog-to-routing/vtr-verilog-to-routing/actions/runs/14929980706/job/41947736930

Update 2: Same tests failed in master branch of my fork (Same as VTR master).
https://github.com/AmirhosseinPoolad/vtr-verilog-to-routing/actions/runs/14933274445/job/41954656119

@ueqri
Copy link
Contributor

ueqri commented May 12, 2025

@AmirhosseinPoolad Hi Amir, I am investigating the test failures you mentioned. I have checked the stdout log files of the failed tests in the two action runs:

  • Log file of the test that failed at strong_multiclock when using two threads + four queues in the action run
  • Log file of the test that failed at strong_multiclock when using two threads + eight queues + queue draining (deterministic for Dijkstra mode) in the action run

(Updated) It turned out to be segmentation fault issues (program crashed with signal 11) in CI logs. The log files seems to be incomplete (only partial routing logs were printed and no post-routing log is there), which might be because the corresponding VPR executions (specifically the routing stage) were interrupted. I was wondering if the GitHub runner could have killed the program due to some restrictions (?) on parallel execution.

Additionally, I ran these specific tests locally on wintermute 70 times with no errors or failures occurring. I highly doubt that the issue might be related to the GitHub runner environment.

If there are any GitHub runner docs I should refer to (for the limitation/restrictions) or more context other than the test log files, please let me know and I will try to fix those tests based on that info. Also, could you please pin me the next time this happens? Hopefully that would provide more helpful context for debugging.

@vaughnbetz
Copy link
Contributor

From looking at the log, there's nothing very strange about the command line or this circuit.
One possible clue: it appears the seg fault occurs either in a routing attempt in a binary search routing or right after an attempt where we have a disconnected rr-graph -- no path exists between some source and dest. That triggers an immediate routing failure (in the first routing iteration) and we'll return through a slightly different code path before trying the next routing. The failure is an empty heap -- maybe something bad happens right now with an empty heap and the multi-queue? Some race condition in that case?

@ueqri : if this issue can't be resolved quickly, we should temporarily comment out this test in CI, and can reactivate it once it is stable.

@AlexandreSinger
Copy link
Contributor

@ueqri What is the status on this? Should we temporarily comment this test out? It is causing random failures.

ueqri added a commit to ueqri/vtr-verilog-to-routing that referenced this issue May 15, 2025
…outer

Temporarily disabled the `strong_multiclock` test in `vtr_reg_strong` CI
regression tests for the parallel connection router, due to some random
failures as mentioned in Issue verilog-to-routing#3029.

After fixing the problem with the `strong_multiclock` test, this will be
reactivated.
@ueqri
Copy link
Contributor

ueqri commented May 15, 2025

Still investigating the issue. I tried ThreadSanitizer but nothing particularly interesting was found. Only a few data races were detected (e.g., prune_node, should_not_explore_neighbors), which were written on purpose (as mentioned in the quote) and ensured safety. The sanitizer log can be found here if you are interested.

Quote from the paper: To further reduce lock contention, we added a cheap read-only check before acquiring the lock (Line 10), motivated by Shun et al.

I am currently following the clue provided by Vaughn to see if anything interesting can be found. We can comment out this test for now in the meanwhile (PR #3047 created).

Also, since the CI reproduces these random failures so frequently (I cannot reproduce it even once locally after running ~100 times), it might be worth building an identical CI environment myself and run tests inside that environment to catch the seg faults with gdb hopefully.

tangxifan pushed a commit that referenced this issue May 20, 2025
* [Bison] Raised Minimum Bison Version from 3.0 to 3.3

Raised the minimum Bison version to 3.3 since deprecation warnings were
showing up in libblifparse and libsdcparse which could not be resolved
unless the Bison version was 3.3.

* more fixes for bitstream generation with flat router

* [Router] Upstream Fine-Grained Parallel Router (FPT'24)

Upstreamed the fine-grained parallel router implementation into the VTR
master. The original branch is https://github.com/verilog-to-routing/vtr-verilog-to-routing/tree/mq-parallel-router.

Modified the MultiQueue (SPAA'24) implementation and integrated it into
the VTR codebase.

* [ParallelRouter] Removed Boost from FG Parallel Router

The original FG parallel router used to use boost. VTR does not install
boost by default. Moved to STL instead.

* [Router] Fix Code Formatting Issues

* [Router] Added ConnectionRouter Abstraction and Reduced Code Duplication

Added a partial abstract class for ConnectionRouter, derived from the
pure abstract ConnectionRouterInterface.

The SerialConnectionRouter and ParallelConnectionRouter classes are now
derived from the ConnectionRouter class, utilizing the common class
members and helper functions to reduce code duplication.

* [Router] Added Code Comments and Documentation for Connection Routers

Added Doxygen-style code comments and documentation for connection
routers, including the ConnectionRouter abstract class, the Parallel-
ConnectionRouter concrete class, and the SerialConnectionRouter concrete
class.

Updated the helper messages for command-line options added for parallel
connection router.

* [Router] Fixed Interface Issues in NestedNetlistRouter and Code Formats

Fixed the interface issues of ConnectionRouter in NestedNetlistRouter.

Fixed code formats.

Fixed typo in read_options.cpp.

* [Router] Updated Command-Line Usage for Parallel Connection Router

Updated the command-line usage for parallel connection router in both
Read the Docs and read_options.cpp.

* [Router] Added Regression Tests for Parallel Connection Router

Added regression tests for parallel connection router by appending extra
sets of configurations to those VTR flow regression tests previously
selected by Fahri for testing coarse-grained parallel router.

Removed VPR connection router test (vpr/test/test_connection_router.cpp),
since it has been out-dated for a very long time and has caused lots of
trouble for running VPR C++ tests locally.

* Fixed Code Formatting Issue

Fixed a weird code formatting issue in libs/librtlnumber/src/include/
internal_bits.hpp. GitHub CI said the file failed dev/check-format.sh,
however, the same script runs perfectly in my local environment. Double
checked the version of clang-format, which seemed to be the same as CI.

Directly copied the file from the GitHub repo to resolve this issue.

* [Router] Fixed `No source in route tree` in ParallelConnectionRouter

The `No source in route tree` bug in ParallelConnectionRouter (since
commit 875b98e) has been fixed. It turns out that putting another member
variable `MultiQueueDAryHeap<HeapImplementation::arg_D> heap_` in the
derived class ParallelConnectionRouter together with the existing
`HeapImplementation heap_` in the base class ConnectionRouter causes the
issue. The solution is to keep `heap_` only in the base class and use
`ConnectionRouter<MultiQueueDAryHeap<HeapImplementation::arg_D>>` rather
than `ConnectionRouter<HeapImplementation>` for deriving the parallel
connection router.

Please note that ParallelConnectionRouter still has some bugs (i.e.,
getting stuck in the MultiQueue pop). This commit is not fully working.
Please do not use it for any experiments.

Updated the previously incorrect command-line options for the parallel
connection router in the regression tests.

* [AP][MassLegalizer] Revistited Mass Legalizer

Found that the mass legalizer was not spreading out the blocks well
enough according to the mass.

Revistied the spatial partitioning in the mass legalizer. Before, we
just cut the window in half in the larger dimension. This was fine,
however it may create an inbalanced cut which can cause things to not
spread well. Instead, we now search for the best partition by trying
different partition lines and computing how balanced the partition is.
Although this is more expensive than before, by creating more balanced
partitions, it should allow the mass legalizer to converge faster. Time
in the mass legalizer is also dominated by partitioning the blocks, so
increasing the time to choose the partition line should not have that
large of an effect anyways.

Found an oversight with how blocks were partitioned when one of the
partitions become overfilled. Fixed this issue.

* Inverse use of macro_can_be_placed argument check_all_legality to align with meaning

* [vpr][pack] fix merge issues w/ flat sync list

* make format

* [packages] add clang-format

* make format 2

* Invalid C++ fix

* [docker] set ubuntu version to 24.04

* [dockerfile] enable system-wide python package installation for pip

* [dockerfile] add comment

* [package] check whehter clang-format-18 package exist

* [package] remove deprecated names-only option

* [package] remove if condition

* [doc] update quick start on installing packages

* make format
enum class e_rr_type
a few remaining t_rr_type vals
CHANY ---> t_rr_type::CHANY
CHANX ---> t_rr_type::CHANX
OPIN ---> t_rr_type::OPIN
IPIN ---> t_rr_type::IPIN
SINK ---> t_rr_type::SINK
SOURCE ---> t_rr_type::SOURCE

* [Router] Finally fixed the weird bug in parallel connection router

Fixed the weird bug in parallel connection router as mentioned in commit
f73212c. The bug occurred because two function parameters 'num_threads'
and 'num_queues' have been misplaced when instantiating the MQ_IO. This
took two weeks to figure out exactly.

The VTR benchmark (`vtr_reg_qor_chain` task) has been tested/passed for
different cases (1) 'serial mode' 1T+2Q (1 thread, 2 queues), (2) 2T+4Q,
and (3) 4T+2Q.

The determinism has also been verified for the VTR benchmark.

* [Router] Fixed Code Review Comments and Cleanup Codebase

Added more explanation to the command-line options messages and code
comments.

Cleaned up ParallelConnectionRouter-related codebase.

* [doc] clarify that clang-format is not required to build VPR

* remove typedef t_rr_type

* doxygen comment for Direction

* add vtr::array class

* make rr_node_typename of type vtr::array to index it only with e_rr_type

* add default constructor to vtr::array

* access rr_node_indices_ with e_rr_type instead of casting to size_t

* add single argument constructor to vtr::array

* [Router] Updated Golden Results for Parallel Connection Router CI Tests

Updated the golden results for CI tests for parallel connection router:
 - `vtr_reg_strong/koios_test`
 - `vtr_reg_strong/strong_flat_router`
 - `vtr_reg_strong/strong_multiclock`
 - `vtr_reg_strong/strong_timing`

* use vtr::array to index some arrays using e_rr_type

* make format

* avoid using e_rr_type and casting it in place_macro

* [vpr][base] fix assigned pb_graph_pin when graph node is not primitive

* [vpr][pack] pass logical type to alloc_and_laod_pb_route

* [vpr][pack] update alloc_and_load_pb_route header file

* [vpr][pack] fix pb_graph_pin assignment in load_trace_to_pb_route

* [test] keep 3d sb and cb tests

* [vpr][pack] add intra_lb_pb_pin_lookup_ to cluster legalizer

* [vpr][pack] initializer intra_lb_pb_pin_lookup and pass it to alloc_and_load_pb_route

* [vpr][pack] use intra_lb_pb_pin_lookup to get pb_pin from pin number

* make format

* add vtr::array to docs

* [vpr][pack] remove casting net id

* [vpr][pack] add doxygen comment for alloc_and_load_pb_route

* [vpr][pack] remove redundant parameters

* [vpr][pack] polish load_trace_to_pb_route

* make format

* [vpr][pack] fix parameter shadowing

* [AP][HotFix] Fixed Bug With Solver Putting Blocks Off-Device

After moving fixed blocks to the center of tiles, there is a very small
chance that blocks go off the device due to rounding. This is such a
small effect that it does not show up locally on my machine, but it
shows up on CI. Clamping the positions of blocks after solving to be
just within the device region.

* Increase the daily stale issue action API call limit

* [vpr][pack] add a method to get root_ipin

* [vpr][pack] remove unused var

* [Router] Added Assert for MQ_IO numQueues and Updated Golden Results

Added assert for MultiQueueIO numQueues to ensure it must be greater
than two.

Updated CI test tasks to ensure the parallel connection router runs in
Dijkstra mode to ensure determinism and avoid hanging in CI runs.

* [AP][HotFix] Placed Fixed Blocks First During IP

The cost terms in the AP initial placer were not placing fixed blocks
early enough, causing other blocks to take their place and causing the
initial placer to not return a solution.

Blocks which have region constraints are now placed first based on how
constrained they are. More constrained blocks (can only be placed in a
smaller region) will be placed first.

Also found that macros that contained fixed blocks were not observing
these constraints when calculating the centroid position of the macro.
For constrained macros, projected the centroid position onto the
partition region to get the closest point in the partition region to the
calculated centroid. This new centroid is used to then perform the
placement.

* [STA] Added Option to Remove Parameters from Post-Implementation Netlist

When performing post-implementation timing analysis using OpenSTA, the
generated netlist cannot use parameters since each module needs to
correspond with a cell in a liberty file.

Added a command-line option which tells the netlist writer to not use
parameters when generating the netlist. If a primitive cannot be
generated without using parameters, it will error out.

* [Tatum][Parse] Fixed Extraneous Warning With get_clocks

The get_clocks command is used in an SDC file to reference a set of
clocks by name using a regex string. The code to do this tries to
produce a warning if get_clocks is used on a regex string and no clocks
could be found. The issue is that the code to do this was mistakenly
producing this warning for each clock in the circuit. For example, if we
had {clk1, clk2, clk3} and we wanted to do "get_clocks {clk3}", we will
get two warnings since clk1 and clk2 did not match.

Fixed this by moving the warning out of one loop nest.

* Remove PR staling
This commit sets the number of days before marking
issues or PRs as stale to 100 years. This number is
overriden for issues to be 1 years but stays 100 years
for PRs. This means that PR effectively do not get
marked as stale.

* [LibArchFPGA] Updating Model Data Structures

The logical models (the technology-mapped logical blocks) for an
architecture were stored using two independent linked lists. One for the
library models (the models that all architectures have, such as luts and
ffs) and one of the user models. This linked lists were hard to traverse
and were injecting pointers all across VPR.

Created a new class to store and manage the logical models. This class
maintains a unique ID for each logical model (similar to the netlist
data structures in VPR). It also contains helper methods to make working
with the logical models easier.

* fix comments from alex

* revert prepacker changes

* [vpr][pack] add get_pattern_blocks

* [vpr][pack] add blocks in get_all_connected_primitive_pins if they are a part of the pattern

* make format

* Bump libs/EXTERNAL/libcatch2 from `76f70b1` to `5abfc0a`

Bumps [libs/EXTERNAL/libcatch2](https://github.com/catchorg/Catch2) from `76f70b1` to `5abfc0a`.
- [Release notes](https://github.com/catchorg/Catch2/releases)
- [Commits](catchorg/Catch2@76f70b1...5abfc0a)

---
updated-dependencies:
- dependency-name: libs/EXTERNAL/libcatch2
  dependency-version: 5abfc0aa9c1ef4cb40c9f387495134dab02e1af2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* [vpr][pack] add more comments

* Add helper functions to t_pb_type

* Change t_pb_type users to use helper functions

* Add documentation for t_pb_type::is_root and is_primitive

* Fix formatting in libarchfpga/physical_types.h

* [vpr][pack] change count method to find

* [Router] Updated the Regression Tests and Corresponding Golden Results

Changed `multi_queue_num_threads` and `multi_queue_num_queues` settings
in the CI strong regression tests to avoid QoR failure in the CI runs.

The coverage of the regression tests for parallel connection router
after this change is still fair.

* [vpr][CLI] add generate_net_timing_report

* [vpr][route] remove debugging msg

* [vpr][analysis] add generate_net_timing_report

* [vpr][pack] apply formatting comments

* make format

* [vpr][analysis] add comments

* make format

* [vpr][CLI] remove generate net timing from CLI parameters and generate the report by default

* Unused Packer Options Cleanup (#2976)

* Standardized and renamed packer alpha and beta variable. They are now referred to as timing_gain_weight and connection_gain_weight, used as a weight parameter during timing and connection driven clustering respectively. Removed global_clocks, use_attraction_groups, pack_num_moves, pack_move_type from packer.

* [APPack] Updated Max Candidate Distance Interface

The max candidate distance is used by APPack to decide which molecules
to ignore when packing, based on their distance from the cluster being
formed.

Cleaned up the interface of this by pre-computing the max candidate
distance of all logical blocks ahead of time and reading from these
pre-computed values during packing.

Added a command-line option to allow the user to override some or all of
these max distance thresholds. By default, VPR will select values based
on the type of logical block and the primitives it contains.

Fixed issue with APPack creating too many IO blocks for some circuits
due to the max candidate distance thresholds for IO blocks being too
low.

More tuning should be done on these values once the mass legalizer has
been cleaned up a bit more.

* [vtr][parse] fix pattern for init place wl

* [vpr][analysis] add header for net timing report

* [vpr][analysis] add timing format to comments

* formatting fix

* Revert "[vpr][CLI] remove generate net timing from CLI parameters and generate the report by default"

This reverts commit b8289db.

* make format

* [STA] Generating SDC Commands Post-Implementation

Added an option to have VPR generate an SDC file containing the timing
commands required for an external timing analysis of the post-
implementation netlist to match VPR's timing analysis.

* [STA] Added Tutorial for Post-Implementation Timing Analysis

Created a tutorial demonstrating how OpenSTA can be used after VPR to
perform static timing analysis.

* Add artifact upload to nightly test workflow

* t_det_routing_arch* --> const t_det_routing_arch&

* t_chan_width_dist ---> const t_chan_width_dist&

* make format

* fix compilation error in route_diag by passing det_routing_arch argument by reference instead of pointer

* [task] add generate_net_timing_report to timing report strong test

* [doc] add doc for generating _net_timing_report command line option

* [vpr][timing] update generate_net_timing_report comment

* [vpr][timing] add get_net_bounding_box

* [vpr][timing] add net bounding box to the report

* [test] add test for net timing report

* [doc] update doc with new format to net timing report

* [vpr][analysis] fix net timing report bugs + including layer min/max of bb

* make format

* [vpr][analysis] capture vars by reference in lambda

* [packer] Changing the vector of candidate molecules into LazyPopUniquePriorityQueue.

    The class LazyPopUniquePriorityQueue is a priority queue that allows for lazy deletion of elements.
    It is implemented using a vector and 2 sets, one set keeps track of the elements in the queue, and the other set keeps track of the elements that are pending deletion.
    The queue is sorted by the sort-value(SV) of the elements, and the elements are stored in a vector.
    The set is used to keep track of the elements that are pending deletion, so that they can be removed from the queue when they are popped.
    The class definiation can be found in vpr/src/util/lazy_pop_unique_priority_queue.h

    Currently, the class supports the following functions:
        LazyPopUniquePriorityQueue::push(): Pushes a key-sort-value (K-SV) pair into the priority queue and adds the key to the tracking set.
        LazyPopUniquePriorityQueue::pop(): Returns the K-SV pair with the highest SV whose key is not pending deletion.
        LazyPopUniquePriorityQueue::remove(): Removes an element from the priority queue immediately.
        LazyPopUniquePriorityQueue::remove_at_pop_time(): Removes an element from the priority queue when it is popped.
        LazyPopUniquePriorityQueue::empty(): Returns whether the queue is empty.
        LazyPopUniquePriorityQueue::clear(): Clears the priority queue vector and the tracking sets.
        LazyPopUniquePriorityQueue::size(): Returns the number of elements in the queue.
        LazyPopUniquePriorityQueue::contains(): Returns true if the key is in the queue, false otherwise.

* [packer] recollected golden results for regression basic, basic_odin, strong, strong_odin

* [packer] recollected golden results for Nightly

* add pointer to VTR9 paper in the readme

* Add documentation to explain which parts of VPR are parellel

* pass t_chan_width by reference

* doxygen comment for alloc_and_load_rr_node_indices

* add doxygen comments for load_block_rr_indices()

* [AP][Solver] Enabled Parallel Eigen

The Eigen solver has the ability to use OpenMP to run the solver
computations in parallel. Made the AP flow use the num_workers option to
set the number of threads that Eigen can use.

VPR did not have the ability to build with OpenMP in its CMAKE. Added an
option to the CMAKE to allow the user to enable OpenMP.

* remove unused is_flat argument from alloc_and_load_rr_node_indices() and load_block_rr_indices()

* use (x, y) convention for CHANX instead of (y, x)

* make format

* cast x/y to size_t

* get rid of warnings in RRSpatialLookup::find_nodes()

* Add references to the main VTR papers in the documentation.

* Add link to the VTR 9 paper in the documentation

* Add link to the VTR 9 paper in the README

* add a closing ) to the text printed by node_coordinate_to_string()

* fix the x/y mismatch for CHANX nodes in rr_nodes and rr_node_indices

* reserve nodes using x/y instead of chan/seg

* fix a typo

* add rr_graph_genearion directory

* resize node lookup for CHANX nodes in RR graph serializer

* add rr_node_indices.cpp/.h

* add doxygen comment for load_chan_rr_indices()

* [Infra] Updated Install Packages Script For Backwards Compatibility

The install_apt_packages.sh script is no longer backward compatible with
older versions of Ubuntu due to the dependency on clang-format-18.

Added an if statement to check if the distribution can support
clang-format-18 and only installing it if it can.

Added this script to the CI build process so it can always be tested
within the CI to prevent future regression.

* [RegTest] Disabled `strong_multiclock` test for parallel connection router

Temporarily disabled the `strong_multiclock` test in `vtr_reg_strong` CI
regression tests for the parallel connection router, due to some random
failures as mentioned in Issue #3029.

After fixing the problem with the `strong_multiclock` test, this will be
reactivated.

* [doc] update the doc with new report format

* [RegTest] Updated golden results for `strong_multiclock` regression test

Removed the golden results of parallel connection router test cases for
`strong_multiclock` regression test.

* [vpr][analysis] use std::min/max instead of if condition

* Add documentation for include sanitization

* [vpr][analysis] change report_net_timing format to csv

* [vpr][analysis] update comments

* [vpr][analysis] print constant nets in  the net timing report

* [vpr][analysis] apply comments

* [vpr][analysis] fix function name

* [doc] add net timing report use case

* fix a typo

* [Infra] Cleaned Up Include Files in VPR Base Directory

Many include files in the base directory contained includes to other
headers which they do not use. This causes many CPP files to include way
more header files than they need, increasing the incremental build time.

This process needs to be done on the entire VTR repo, but I found that
the base directory was one of the biggest culprits of this and the
hardest to untangle.

* [Infra] Cleaned Up Header Files in Pack Folder

Went through the header files in the pack folder and resolved any unused
header files.

* [AP] Removed Old Cluster-Level AP Flow

Prior to the flat AP flow, a cluster-level AP flow existed in VPR which
performed a SimPL-style algorithm on the clusters created during packing
before performing a placement quench.

Although well-written, this flow was not shown to outperform the SA
placer in VPR. It has also been becoming confusing to keep in VPR since
the new flat AP flow supercedes it. It is unclear if a cluster-level AP
flow will work well with the flat AP flow; however in that case the
cluster-level AP flow would be made using the new AP APIs written.

Removed the old cluster-level AP flow to reduce confusion.

* [Infra] Cleaned Up Header Files in Place Folder

* [lib][rr_graph] replace t_rr_type with e_rr_type

* [vpr][tileable] remove t_rr_type usage

* make is_io_type() a member function of t_physical_tile_type

* replace calls to is_io_type() with t_physical_tile_type::is_io()

* make format

* fix compiler bugs

* make format

* [lib][libutil] fix size_t issue

* inline  t_physical_tile_type::is_io()

* add doxygen comments for alloc_and_load_tile_rr_node_indices()

* [libs][vtrutil] use generate instead of fill to avoid getting potential null pointer dereference

* document alloc_and_load_rr_node_indices() arguments

* made a few function operating on t_pb_type its member functions

* add router_lookahead directory

* [STA] Added Multiclock Incremental STA Consistency Check

The incremental STA consistency coverage was very good, but was just
missing a multiclock circuit with an SDC file.

Added a quick test.

* [libs][rr_graph] don't reverse xy when calling node lookup

* [vpr][util] consider medium node type as inter cluster node

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: AlexandreSinger <[email protected]>
Co-authored-by: Fahrican Kosar <[email protected]>
Co-authored-by: AlexandreSinger <[email protected]>
Co-authored-by: Hang Yan <[email protected]>
Co-authored-by: Fred Tombs <[email protected]>
Co-authored-by: soheilshahrouz <[email protected]>
Co-authored-by: Soheil Shahrouz <[email protected]>
Co-authored-by: Amir Poolad <[email protected]>
Co-authored-by: Amir Poolad <[email protected]>
Co-authored-by: vaughnbetz <[email protected]>
Co-authored-by: Fred Tombs <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: James Yen <[email protected]>
Co-authored-by: Rongbo Zhang <[email protected]>
Co-authored-by: Rongbo Zhang <[email protected]>
Co-authored-by: Mohamed Elgammal <[email protected]>
tangxifan pushed a commit that referenced this issue May 21, 2025
* [Router] Added ConnectionRouter Abstraction and Reduced Code Duplication

Added a partial abstract class for ConnectionRouter, derived from the
pure abstract ConnectionRouterInterface.

The SerialConnectionRouter and ParallelConnectionRouter classes are now
derived from the ConnectionRouter class, utilizing the common class
members and helper functions to reduce code duplication.

* [Router] Added Code Comments and Documentation for Connection Routers

Added Doxygen-style code comments and documentation for connection
routers, including the ConnectionRouter abstract class, the Parallel-
ConnectionRouter concrete class, and the SerialConnectionRouter concrete
class.

Updated the helper messages for command-line options added for parallel
connection router.

* [Router] Fixed Interface Issues in NestedNetlistRouter and Code Formats

Fixed the interface issues of ConnectionRouter in NestedNetlistRouter.

Fixed code formats.

Fixed typo in read_options.cpp.

* [Router] Updated Command-Line Usage for Parallel Connection Router

Updated the command-line usage for parallel connection router in both
Read the Docs and read_options.cpp.

* [Router] Added Regression Tests for Parallel Connection Router

Added regression tests for parallel connection router by appending extra
sets of configurations to those VTR flow regression tests previously
selected by Fahri for testing coarse-grained parallel router.

Removed VPR connection router test (vpr/test/test_connection_router.cpp),
since it has been out-dated for a very long time and has caused lots of
trouble for running VPR C++ tests locally.

* Fixed Code Formatting Issue

Fixed a weird code formatting issue in libs/librtlnumber/src/include/
internal_bits.hpp. GitHub CI said the file failed dev/check-format.sh,
however, the same script runs perfectly in my local environment. Double
checked the version of clang-format, which seemed to be the same as CI.

Directly copied the file from the GitHub repo to resolve this issue.

* [Router] Fixed `No source in route tree` in ParallelConnectionRouter

The `No source in route tree` bug in ParallelConnectionRouter (since
commit 875b98e) has been fixed. It turns out that putting another member
variable `MultiQueueDAryHeap<HeapImplementation::arg_D> heap_` in the
derived class ParallelConnectionRouter together with the existing
`HeapImplementation heap_` in the base class ConnectionRouter causes the
issue. The solution is to keep `heap_` only in the base class and use
`ConnectionRouter<MultiQueueDAryHeap<HeapImplementation::arg_D>>` rather
than `ConnectionRouter<HeapImplementation>` for deriving the parallel
connection router.

Please note that ParallelConnectionRouter still has some bugs (i.e.,
getting stuck in the MultiQueue pop). This commit is not fully working.
Please do not use it for any experiments.

Updated the previously incorrect command-line options for the parallel
connection router in the regression tests.

* [AP][MassLegalizer] Revistited Mass Legalizer

Found that the mass legalizer was not spreading out the blocks well
enough according to the mass.

Revistied the spatial partitioning in the mass legalizer. Before, we
just cut the window in half in the larger dimension. This was fine,
however it may create an inbalanced cut which can cause things to not
spread well. Instead, we now search for the best partition by trying
different partition lines and computing how balanced the partition is.
Although this is more expensive than before, by creating more balanced
partitions, it should allow the mass legalizer to converge faster. Time
in the mass legalizer is also dominated by partitioning the blocks, so
increasing the time to choose the partition line should not have that
large of an effect anyways.

Found an oversight with how blocks were partitioned when one of the
partitions become overfilled. Fixed this issue.

* Inverse use of macro_can_be_placed argument check_all_legality to align with meaning

* [vpr][pack] fix merge issues w/ flat sync list

* make format

* [packages] add clang-format

* make format 2

* Invalid C++ fix

* [docker] set ubuntu version to 24.04

* [dockerfile] enable system-wide python package installation for pip

* [dockerfile] add comment

* [package] check whehter clang-format-18 package exist

* [package] remove deprecated names-only option

* [package] remove if condition

* [doc] update quick start on installing packages

* make format
enum class e_rr_type
a few remaining t_rr_type vals
CHANY ---> t_rr_type::CHANY
CHANX ---> t_rr_type::CHANX
OPIN ---> t_rr_type::OPIN
IPIN ---> t_rr_type::IPIN
SINK ---> t_rr_type::SINK
SOURCE ---> t_rr_type::SOURCE

* [Router] Finally fixed the weird bug in parallel connection router

Fixed the weird bug in parallel connection router as mentioned in commit
f73212c. The bug occurred because two function parameters 'num_threads'
and 'num_queues' have been misplaced when instantiating the MQ_IO. This
took two weeks to figure out exactly.

The VTR benchmark (`vtr_reg_qor_chain` task) has been tested/passed for
different cases (1) 'serial mode' 1T+2Q (1 thread, 2 queues), (2) 2T+4Q,
and (3) 4T+2Q.

The determinism has also been verified for the VTR benchmark.

* [Router] Fixed Code Review Comments and Cleanup Codebase

Added more explanation to the command-line options messages and code
comments.

Cleaned up ParallelConnectionRouter-related codebase.

* [doc] clarify that clang-format is not required to build VPR

* remove typedef t_rr_type

* doxygen comment for Direction

* add vtr::array class

* make rr_node_typename of type vtr::array to index it only with e_rr_type

* add default constructor to vtr::array

* access rr_node_indices_ with e_rr_type instead of casting to size_t

* add single argument constructor to vtr::array

* [Router] Updated Golden Results for Parallel Connection Router CI Tests

Updated the golden results for CI tests for parallel connection router:
 - `vtr_reg_strong/koios_test`
 - `vtr_reg_strong/strong_flat_router`
 - `vtr_reg_strong/strong_multiclock`
 - `vtr_reg_strong/strong_timing`

* use vtr::array to index some arrays using e_rr_type

* make format

* avoid using e_rr_type and casting it in place_macro

* [vpr][base] fix assigned pb_graph_pin when graph node is not primitive

* [vpr][pack] pass logical type to alloc_and_laod_pb_route

* [vpr][pack] update alloc_and_load_pb_route header file

* [vpr][pack] fix pb_graph_pin assignment in load_trace_to_pb_route

* [test] keep 3d sb and cb tests

* [vpr][pack] add intra_lb_pb_pin_lookup_ to cluster legalizer

* [vpr][pack] initializer intra_lb_pb_pin_lookup and pass it to alloc_and_load_pb_route

* [vpr][pack] use intra_lb_pb_pin_lookup to get pb_pin from pin number

* make format

* add vtr::array to docs

* [vpr][pack] remove casting net id

* [vpr][pack] add doxygen comment for alloc_and_load_pb_route

* [vpr][pack] remove redundant parameters

* [vpr][pack] polish load_trace_to_pb_route

* make format

* [vpr][pack] fix parameter shadowing

* [AP][HotFix] Fixed Bug With Solver Putting Blocks Off-Device

After moving fixed blocks to the center of tiles, there is a very small
chance that blocks go off the device due to rounding. This is such a
small effect that it does not show up locally on my machine, but it
shows up on CI. Clamping the positions of blocks after solving to be
just within the device region.

* Increase the daily stale issue action API call limit

* [vpr][pack] add a method to get root_ipin

* [vpr][pack] remove unused var

* [Router] Added Assert for MQ_IO numQueues and Updated Golden Results

Added assert for MultiQueueIO numQueues to ensure it must be greater
than two.

Updated CI test tasks to ensure the parallel connection router runs in
Dijkstra mode to ensure determinism and avoid hanging in CI runs.

* [AP][HotFix] Placed Fixed Blocks First During IP

The cost terms in the AP initial placer were not placing fixed blocks
early enough, causing other blocks to take their place and causing the
initial placer to not return a solution.

Blocks which have region constraints are now placed first based on how
constrained they are. More constrained blocks (can only be placed in a
smaller region) will be placed first.

Also found that macros that contained fixed blocks were not observing
these constraints when calculating the centroid position of the macro.
For constrained macros, projected the centroid position onto the
partition region to get the closest point in the partition region to the
calculated centroid. This new centroid is used to then perform the
placement.

* [STA] Added Option to Remove Parameters from Post-Implementation Netlist

When performing post-implementation timing analysis using OpenSTA, the
generated netlist cannot use parameters since each module needs to
correspond with a cell in a liberty file.

Added a command-line option which tells the netlist writer to not use
parameters when generating the netlist. If a primitive cannot be
generated without using parameters, it will error out.

* [Tatum][Parse] Fixed Extraneous Warning With get_clocks

The get_clocks command is used in an SDC file to reference a set of
clocks by name using a regex string. The code to do this tries to
produce a warning if get_clocks is used on a regex string and no clocks
could be found. The issue is that the code to do this was mistakenly
producing this warning for each clock in the circuit. For example, if we
had {clk1, clk2, clk3} and we wanted to do "get_clocks {clk3}", we will
get two warnings since clk1 and clk2 did not match.

Fixed this by moving the warning out of one loop nest.

* Remove PR staling
This commit sets the number of days before marking
issues or PRs as stale to 100 years. This number is
overriden for issues to be 1 years but stays 100 years
for PRs. This means that PR effectively do not get
marked as stale.

* [LibArchFPGA] Updating Model Data Structures

The logical models (the technology-mapped logical blocks) for an
architecture were stored using two independent linked lists. One for the
library models (the models that all architectures have, such as luts and
ffs) and one of the user models. This linked lists were hard to traverse
and were injecting pointers all across VPR.

Created a new class to store and manage the logical models. This class
maintains a unique ID for each logical model (similar to the netlist
data structures in VPR). It also contains helper methods to make working
with the logical models easier.

* fix comments from alex

* revert prepacker changes

* [vpr][pack] add get_pattern_blocks

* [vpr][pack] add blocks in get_all_connected_primitive_pins if they are a part of the pattern

* make format

* Bump libs/EXTERNAL/libcatch2 from `76f70b1` to `5abfc0a`

Bumps [libs/EXTERNAL/libcatch2](https://github.com/catchorg/Catch2) from `76f70b1` to `5abfc0a`.
- [Release notes](https://github.com/catchorg/Catch2/releases)
- [Commits](catchorg/Catch2@76f70b1...5abfc0a)

---
updated-dependencies:
- dependency-name: libs/EXTERNAL/libcatch2
  dependency-version: 5abfc0aa9c1ef4cb40c9f387495134dab02e1af2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* [vpr][pack] add more comments

* Add helper functions to t_pb_type

* Change t_pb_type users to use helper functions

* Add documentation for t_pb_type::is_root and is_primitive

* Fix formatting in libarchfpga/physical_types.h

* [vpr][pack] change count method to find

* [Router] Updated the Regression Tests and Corresponding Golden Results

Changed `multi_queue_num_threads` and `multi_queue_num_queues` settings
in the CI strong regression tests to avoid QoR failure in the CI runs.

The coverage of the regression tests for parallel connection router
after this change is still fair.

* [vpr][CLI] add generate_net_timing_report

* [vpr][route] remove debugging msg

* [vpr][analysis] add generate_net_timing_report

* [vpr][pack] apply formatting comments

* make format

* [vpr][analysis] add comments

* make format

* [vpr][CLI] remove generate net timing from CLI parameters and generate the report by default

* Unused Packer Options Cleanup (#2976)

* Standardized and renamed packer alpha and beta variable. They are now referred to as timing_gain_weight and connection_gain_weight, used as a weight parameter during timing and connection driven clustering respectively. Removed global_clocks, use_attraction_groups, pack_num_moves, pack_move_type from packer.

* [APPack] Updated Max Candidate Distance Interface

The max candidate distance is used by APPack to decide which molecules
to ignore when packing, based on their distance from the cluster being
formed.

Cleaned up the interface of this by pre-computing the max candidate
distance of all logical blocks ahead of time and reading from these
pre-computed values during packing.

Added a command-line option to allow the user to override some or all of
these max distance thresholds. By default, VPR will select values based
on the type of logical block and the primitives it contains.

Fixed issue with APPack creating too many IO blocks for some circuits
due to the max candidate distance thresholds for IO blocks being too
low.

More tuning should be done on these values once the mass legalizer has
been cleaned up a bit more.

* [vtr][parse] fix pattern for init place wl

* [vpr][analysis] add header for net timing report

* [vpr][analysis] add timing format to comments

* formatting fix

* Revert "[vpr][CLI] remove generate net timing from CLI parameters and generate the report by default"

This reverts commit b8289db.

* make format

* [STA] Generating SDC Commands Post-Implementation

Added an option to have VPR generate an SDC file containing the timing
commands required for an external timing analysis of the post-
implementation netlist to match VPR's timing analysis.

* [STA] Added Tutorial for Post-Implementation Timing Analysis

Created a tutorial demonstrating how OpenSTA can be used after VPR to
perform static timing analysis.

* Add artifact upload to nightly test workflow

* t_det_routing_arch* --> const t_det_routing_arch&

* t_chan_width_dist ---> const t_chan_width_dist&

* make format

* fix compilation error in route_diag by passing det_routing_arch argument by reference instead of pointer

* [task] add generate_net_timing_report to timing report strong test

* [doc] add doc for generating _net_timing_report command line option

* [vpr][timing] update generate_net_timing_report comment

* [vpr][timing] add get_net_bounding_box

* [vpr][timing] add net bounding box to the report

* [test] add test for net timing report

* [doc] update doc with new format to net timing report

* [vpr][analysis] fix net timing report bugs + including layer min/max of bb

* make format

* [vpr][analysis] capture vars by reference in lambda

* [packer] Changing the vector of candidate molecules into LazyPopUniquePriorityQueue.

    The class LazyPopUniquePriorityQueue is a priority queue that allows for lazy deletion of elements.
    It is implemented using a vector and 2 sets, one set keeps track of the elements in the queue, and the other set keeps track of the elements that are pending deletion.
    The queue is sorted by the sort-value(SV) of the elements, and the elements are stored in a vector.
    The set is used to keep track of the elements that are pending deletion, so that they can be removed from the queue when they are popped.
    The class definiation can be found in vpr/src/util/lazy_pop_unique_priority_queue.h

    Currently, the class supports the following functions:
        LazyPopUniquePriorityQueue::push(): Pushes a key-sort-value (K-SV) pair into the priority queue and adds the key to the tracking set.
        LazyPopUniquePriorityQueue::pop(): Returns the K-SV pair with the highest SV whose key is not pending deletion.
        LazyPopUniquePriorityQueue::remove(): Removes an element from the priority queue immediately.
        LazyPopUniquePriorityQueue::remove_at_pop_time(): Removes an element from the priority queue when it is popped.
        LazyPopUniquePriorityQueue::empty(): Returns whether the queue is empty.
        LazyPopUniquePriorityQueue::clear(): Clears the priority queue vector and the tracking sets.
        LazyPopUniquePriorityQueue::size(): Returns the number of elements in the queue.
        LazyPopUniquePriorityQueue::contains(): Returns true if the key is in the queue, false otherwise.

* [packer] recollected golden results for regression basic, basic_odin, strong, strong_odin

* [packer] recollected golden results for Nightly

* add pointer to VTR9 paper in the readme

* Add documentation to explain which parts of VPR are parellel

* pass t_chan_width by reference

* doxygen comment for alloc_and_load_rr_node_indices

* add doxygen comments for load_block_rr_indices()

* [AP][Solver] Enabled Parallel Eigen

The Eigen solver has the ability to use OpenMP to run the solver
computations in parallel. Made the AP flow use the num_workers option to
set the number of threads that Eigen can use.

VPR did not have the ability to build with OpenMP in its CMAKE. Added an
option to the CMAKE to allow the user to enable OpenMP.

* remove unused is_flat argument from alloc_and_load_rr_node_indices() and load_block_rr_indices()

* use (x, y) convention for CHANX instead of (y, x)

* make format

* cast x/y to size_t

* get rid of warnings in RRSpatialLookup::find_nodes()

* Add references to the main VTR papers in the documentation.

* Add link to the VTR 9 paper in the documentation

* Add link to the VTR 9 paper in the README

* add a closing ) to the text printed by node_coordinate_to_string()

* fix the x/y mismatch for CHANX nodes in rr_nodes and rr_node_indices

* reserve nodes using x/y instead of chan/seg

* fix a typo

* add rr_graph_genearion directory

* resize node lookup for CHANX nodes in RR graph serializer

* add rr_node_indices.cpp/.h

* add doxygen comment for load_chan_rr_indices()

* [Infra] Updated Install Packages Script For Backwards Compatibility

The install_apt_packages.sh script is no longer backward compatible with
older versions of Ubuntu due to the dependency on clang-format-18.

Added an if statement to check if the distribution can support
clang-format-18 and only installing it if it can.

Added this script to the CI build process so it can always be tested
within the CI to prevent future regression.

* [RegTest] Disabled `strong_multiclock` test for parallel connection router

Temporarily disabled the `strong_multiclock` test in `vtr_reg_strong` CI
regression tests for the parallel connection router, due to some random
failures as mentioned in Issue #3029.

After fixing the problem with the `strong_multiclock` test, this will be
reactivated.

* [doc] update the doc with new report format

* [RegTest] Updated golden results for `strong_multiclock` regression test

Removed the golden results of parallel connection router test cases for
`strong_multiclock` regression test.

* [vpr][analysis] use std::min/max instead of if condition

* Add documentation for include sanitization

* [vpr][analysis] change report_net_timing format to csv

* [vpr][analysis] update comments

* [vpr][analysis] print constant nets in  the net timing report

* [vpr][analysis] apply comments

* [vpr][analysis] fix function name

* [doc] add net timing report use case

* fix a typo

* [Infra] Cleaned Up Include Files in VPR Base Directory

Many include files in the base directory contained includes to other
headers which they do not use. This causes many CPP files to include way
more header files than they need, increasing the incremental build time.

This process needs to be done on the entire VTR repo, but I found that
the base directory was one of the biggest culprits of this and the
hardest to untangle.

* [Infra] Cleaned Up Header Files in Pack Folder

Went through the header files in the pack folder and resolved any unused
header files.

* [AP] Removed Old Cluster-Level AP Flow

Prior to the flat AP flow, a cluster-level AP flow existed in VPR which
performed a SimPL-style algorithm on the clusters created during packing
before performing a placement quench.

Although well-written, this flow was not shown to outperform the SA
placer in VPR. It has also been becoming confusing to keep in VPR since
the new flat AP flow supercedes it. It is unclear if a cluster-level AP
flow will work well with the flat AP flow; however in that case the
cluster-level AP flow would be made using the new AP APIs written.

Removed the old cluster-level AP flow to reduce confusion.

* [Infra] Cleaned Up Header Files in Place Folder

* [lib][rr_graph] replace t_rr_type with e_rr_type

* [vpr][tileable] remove t_rr_type usage

* make is_io_type() a member function of t_physical_tile_type

* replace calls to is_io_type() with t_physical_tile_type::is_io()

* make format

* fix compiler bugs

* make format

* [lib][libutil] fix size_t issue

* inline  t_physical_tile_type::is_io()

* add doxygen comments for alloc_and_load_tile_rr_node_indices()

* [libs][vtrutil] use generate instead of fill to avoid getting potential null pointer dereference

* document alloc_and_load_rr_node_indices() arguments

* made a few function operating on t_pb_type its member functions

* add router_lookahead directory

* [STA] Added Multiclock Incremental STA Consistency Check

The incremental STA consistency coverage was very good, but was just
missing a multiclock circuit with an SDC file.

Added a quick test.

* [libs][rr_graph] don't reverse xy when calling node lookup

* [vpr][util] consider medium node type as inter cluster node

* [Infra] Cleaned Up Header Files in Route Folder

Continued the header file cleanup effort in the route folder.

Some of these files may need to be revisited in more detail, but got
some of the major header include issues.

Found that some definitions were in the wrong place, so moved them to
the correct implementation file.

* [Infra] Updated Header Files Based on Comments

Moved to pragma once symantics and cleaned up some less than ideal
code.

* [vpr][tileable] use is_io in t_physcial_tile

* [vpr][route] update rr node indices to include medium type

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Hang Yan <[email protected]>
Co-authored-by: AlexandreSinger <[email protected]>
Co-authored-by: Fred Tombs <[email protected]>
Co-authored-by: soheilshahrouz <[email protected]>
Co-authored-by: AlexandreSinger <[email protected]>
Co-authored-by: Soheil Shahrouz <[email protected]>
Co-authored-by: Amir Poolad <[email protected]>
Co-authored-by: Amir Poolad <[email protected]>
Co-authored-by: vaughnbetz <[email protected]>
Co-authored-by: Fred Tombs <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: James Yen <[email protected]>
Co-authored-by: Rongbo Zhang <[email protected]>
Co-authored-by: Rongbo Zhang <[email protected]>
Co-authored-by: Mohamed Elgammal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants