-
Notifications
You must be signed in to change notification settings - Fork 415
Unexpected timing constraint behavior on global clock buffers #1013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@vaughnbetz / @kmurray I'm guessing this behavior is caused by how VPR constraints clocks. If no SDC is supplied, does VPR generate constraints? And then if an SDC is supplied, VPR generates no constraints? Should VPR always generate the "default" constraints for unconstrained clocks, even in the presence of an SDC file? Should VPR generate better diagnostics if generating the "default" clock constraint is undesirable? |
I strongly believe generating clock constraints by default when there is an SDC is highly undesirable. We constrain clocks to go as fast as possible when there is no sdc as a convenience feature. Once there is an sdc, we obey it. [FYI: this approximately matches what we did in Quartus, and from what I can observe approximately matches what Xilinx does, although both those tools do a "light" optimization to save cpu time with no sdc]. Mixing in default (go as fast as possible on other, unconstrained clocks) when you have an sdc constraining some is very counter-intuitive. Cross-clock domain paths would be another random default (once you have an sdc, whether or not those paths are optimized and how much is defined by exactly how you define and relate your clocks.). So the immediate workaround for this is to define your sdc to make a single base clock, and a derived clock that is 1x related to it, and apply the derived clock to the global buffer in your sdc. Or if the original input clock doesn't do anything interesting, you could just constrain the output of the global buffer. To make this more convenient and to capture the delay of the global clock buffer in the clock delay (useful if you are also setting timing constraints between I/Os and this clock), it would be best for the timing graph builder to understand that when clocks go through buffer blocks they remain part of the clock tree and the clock hasn't hit leafs yet. Kevin, how hard would that be to do in a data-driven way? It's possible the code already does this if the clock buffer is defined as a combinational input->output block (not as a clock port input that would indicate there is a register in it). |
I do not disagree with anything you have written, however an error in sdc constraints resulting in:
With the only indication of failure is the CPD = NaN and the following message:
There ought to have a been a message akin to:
or something. The only reason I figured out how to get past this issue is the fact that no SDC => everything was fine, SDC only on input clock + global clock buffer => borked, SDC on input clock w/o global clock buffer => everything was fine. |
I think there are a couple of things here;
|
Agreed on 1 and 4. Both will take some development, which this issue can track. 3 is done: see https://docs.verilogtorouting.org/en/latest/vpr/timing_constraints/#default-timing-constraints (explains the defaults and the equivalent SDC constraints) 2: I disagree that this should be an error. Not specifying a constraint on a clock, or on paths between clocks, or on some I/O paths means you do not care about their speed. This happens on synchronizer-based cross-clock domain paths for example, and is perfectly reasonable for them (there is no timing constraint -- they will work at any speed). It is the common CAD tool behaviour to allow this. I agree that we should have a missing constraint report / unconstrained path report though (#4 above), as that helps flag if you unintentionally left some paths out. |
1 is the most important, and shouldn't be that hard as Kevin's code already traces through .names buffers (he thinks). Keith, can you attach a small test case (arch file, small example design, and point us at the exact global clock buffer syntax / definition in both the architecture and .blif files). |
4 is a good ramp up or summer student task. Will look for an owner. Asking @kmurray (nicely!) if he can squeeze in the solution to 1 above. |
@vaughnbetz - I agree that with #2 most CAD tools let you forget a constraint on a clock it'll just continue happily. I have lost many hours to this behaviour. This also frequently trips up beginners. It would be perfectly fine to have a |
I've implemented (1). It was somewhat more involved as it required some changes to how the timing graph is constructed and how Tatum handles propagating clocks. The new behaviour is as follows. For instance consider a black-box clock buffer with an enable:
Which would have a corresponding model: <model name="clkbufce">
<input_ports>
<port name="in" combinational_sink_ports="out" is_clock="1"/>
<port name="enable" combinational_sink_ports="out"/>
</input_ports>
<output_ports>
<port name="out"/>
</output_ports>
</model> and pb_type: <pb_type name="clkbufce" blif_model="clkbufce" num_pb="1">
<clock name="in" num_pins="1"/>
<input name="enable" num_pins="1"/>
<output name="out" num_pins="1"/>
<delay_constant max="10e-12" in_port="clkbufce.in" out_port="clkbufce.out"/>
<delay_constant max="5e-12" in_port="clkbufce.enable" out_port="clkbufce.out"/>
</pb_type> where the When tracing back the clock net 'clk3_buf', VPR will now look at the Notably, since only the The timing graph builder, delay calculators and tatum needed some small tweaks to propagate clocks during analysis between CPIN and OPIN nodes in the timing graph. It all appears to be working, and you can see more details of how it is used in the associated regression test. |
Thanks Kevin! This looks great. |
Thanks Kevin! This will be very useful. There is one more point of subtlety that is worth discussing. The 7-series global clock buffers can act as clock muxes, but they are rarely use that way. If there are two clock inputs and one output, what does your new logic do? And if only one of the two clock inputs are provided, what do you do? |
For something like:
It would detect clka and clkb as separate clocks. Any downstream paths using clk_downstream would be analyzed for both clka and clkb. For automatically generated constraints this means clocks would be created for both clka and clkb, and the downstream logic would be analyzed for both clocks. The automatically generated constraints do not analyze cross-domain paths so you wouldn't see any paths clocked by clk_downstream crossing between clka/clkb (or vice versa). If you manually specified an SDC file with
The tracing is driven by the netlist connectivity, so if only one of the clock inputs is left disconnected:
VPR would see that clk_downstream and clk_b are logically equivalent, so clkb would propagate to the downstream logic. Since the clk1 port is disconnected no additional clock would be inferred. |
Agreed, I'll work on getting that in the docs. |
That clock muxing behaviour all sounds good. Thanks Kevin. |
I've updated the timing modelling tutorial to include these examples. Closing as I believe the primary issue is now addressed. |
Looks great; thanks Kevin! |
Expected Behaviour
Constraining the input clock pin should propagate through global clock buffers
Current Behaviour
Constraining the input clock pin results in all other clock buffers being ignored.
Possible Solution
Steps to Reproduce
Context
We are introducing global clock buffers into the 7-series arch, and VPR sees this as two clocks, the clock from the input clock pin, and the clock from the output of the global buffer. Initial testing with global buffers went well, however no
sdc
file was supplied. When ansdc
was supplied that only created a clock on the input pin, but not the global clock buffer net, it caused all timing analysis on BELs that were connected to the global buffer to report no launch clock. This resulted in both placement and routing having a NaN critical path delay.Your Environment
The text was updated successfully, but these errors were encountered: