Skip to content

Routability-based placement constraints #1606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mkurc-ant opened this issue Dec 10, 2020 · 3 comments
Closed

Routability-based placement constraints #1606

mkurc-ant opened this issue Dec 10, 2020 · 3 comments
Labels

Comments

@mkurc-ant
Copy link
Collaborator

VPR placer is not aware of whether placement of a block is legal in terms of its routability to other connected blocks (as the actual rr graph defines it). This poses an issues in architectures where the FPGA grid is divided into clock regions (eg. Xilinx 7-series or QuickLogic EOS-S3) as placement of flip-flops driven by a regional clock buffer should be constrained by its placement. All these architectures use custom rr graphs.

Proposed Behaviour

For an architecture consisting of CLB blocks (with flip-flops) and regional clock buffers the placer should know that a CLB can only be placed in the same region as its driving regional clock buffer GMUX.

An example layout with 4 clock regions consisting of a regional clock buffer GMUX and 8 complex logic block CLB is shown below. Placement of a GCLK block should constrain placement region of all CLB blocks driven by it:
VPR placement constraints

One could imagine hierarchical clock regions: For example a device grid divided into quadrants with driving buffers which are also divided into sub-quadrants with their driving buffers etc. In that case the placement constraint relation becomes hierarchical:
VPR placement constraints2

Current Behaviour

Currently the placer is not aware of the clock regions and can place a GMUX clock buffer in a different region that a CLB driven by it. This causes the router to fail as the actual route between such placed blocks is not present.

Possible Solution

The placer could behave in a multi-stage hierarchical way. The first stage would be the placement of global clock buffers. Then in a next stage regional buffers etc. In the last stage regular logic would be placed.

The information about hierarchy of clock resources and routability could possibly be derived from the externally supplied rr graph (slow). Or it could be built using a description read from a file (T.B.D.).

Context

In SymbiFlow support for Xilinx 7-series devices as well as for QuickLogic EOS-S3 device regional clock buffers are modeled as edges of the routing graph. The solution is a workaround of the routability problem but it prevents modeling those buffers as placeable blocks.

For Xilinx 7-series there is a Python script that analyzes clock resources required by the design and constrains them so that they are routable.

Future QuickLogic FPGA architectures will be having hierarchical clock regions.

This issue is similar to #932 but here the constraint region is not fixed but depends on placement of other cells.

@vaughnbetz
Copy link
Contributor

I think this should be tackled in 3 parts:

  1. Choose where to put the global clock buffers. This tends to be quite constrained and device specific, and I know Symbiflow has an external solution for this for Artix7. Possibly a general engine based on the rr-graph could be developed, but that's a research task and we shouldn't wait for it. In the interim, we can write external tools or (if necessary) put device-specific code in a subdirectory in vpr called arch_specific or some such.

  2. The device specific code of 1 should create clustering & placement region constraints as envisioned in Add Placement Constraints to VPR #932 so we constrain placement to legal regions, if some clocks can't reach the whole chip. This could be done through files (i.e. using the constraint.xml file) or through direct calls to the region constraint creation apis.

  3. To create a legal placement when there is muxing in the clock tree (e.g. Xilinx leaf clocks, Intel row_clocks), we will need to have a cardinality constraint engine that counts how many of some resource (e.g. leaf clock wires) are used in a region (e.g. the fragment of a pair of columns spanned by an Ultrascale+ leaf clock). The idea is that this engine would be given or compute a set of attributes on each block in the netlist (e.g. clock net ids) and then count how distinct values were used in some maps over the chip (e.g. in each region corresponding to a leaf clock or other clock region). A placement that exceeded the cardinality constraint (e.g. number of leaf clock wires) would not be legal, and would either be forbidden or have a high cost (I'd use forbidden as it is easier to understand and tune, but once the engine is in place both options are possible).

Copy link

github-actions bot commented May 9, 2025

This issue has been inactive for a year and has been marked as stale. It will be closed in 15 days if it continues to be stale. If you believe this is still an issue, please add a comment.

@github-actions github-actions bot added the Stale label May 9, 2025
Copy link

This issue has been marked stale for 15 days and has been automatically closed.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants