Placer reporting cost errors during consistency checks #1402

litghost · 2020-07-06T23:36:45Z

Expected Behaviour

Placer places circuit and writes placement file.

Current Behaviour

Placer aborts with the following error:

BB estimate of min-dist (placement) wire length: 310
Error 1: timing_cost_check: 1.75857e-08 and timing_cost: 1.17164e-09 differ in check_place.
Incr Slack updates 1 in 7.5417e-05 sec
Full Max Req/Worst Slack updates 1 in 1.8772e-05 sec
Incr Max Req/Worst Slack updates 0 in 0 sec
Incr Criticality updates 0 in 0 sec
Full Criticality updates 1 in 1.5237e-05 sec
# Placement took 0.23 seconds (max_rss 673.7 MiB, delta_rss +0.0 MiB)
Error 2: 
Type: Placement
File: vtr-verilog-to-routing/vpr/src/place/place.cpp
Line: 2689
Message: 
Completed placement consistency check, 1 errors found.
Aborting program.

Possible Solution

Working on reduced test case. Circuit in this case was very simple, multiple inpad -> outpad connections.

Steps to Reproduce

1.Get input test data:

wget https://storage.googleapis.com/debug_data/vpr_failing_test.tar.bz2
tar -xf vpr_failing_test.tar.bz2

Build Symbiflow VPR at c62dcaf

git clone https://github.com/litghost/vtr-verilog-to-routing.git
cd vtr-verilog-to-routing
git checkout -b test
git reset --hard 87c4d426a5096dc6b8ab7e9adb0c6a39b2154bd2
mkdir build
cd build
cmake -DVTR_ASSERT_LEVEL=3 ..
make -j

Run failing test case:

cd ../../vpr_failing_test
cp ../vtr-verilog-to-routing/vpr/vpr .
./run.sh

Context

Your Environment

VTR revision used:
Operating System and version:
Compiler version:

The text was updated successfully, but these errors were encountered:

litghost · 2020-07-06T23:42:26Z

I'm working on identifying when this abort started happening. @acomodi believes it was around when the incremental timing analysis was added.

litghost · 2020-07-06T23:48:07Z

I tested a circuit that was a little more complicated (a counter), rather than simple straight inport -> outport, same issue. I'll see how much I can reduce the circuit and still produce the error.

HackerFoo · 2020-07-07T16:53:30Z

I also ran into this issue when working on #1205. Disabling incremental timing avoided the error.

litghost · 2020-07-07T17:11:17Z

I also ran into this issue when working on #1205. Disabling incremental timing avoided the error.

Which graph were you running on? So far I've noticed the placer error showing up on 7-series ROI based graphs, but I would've assumed you would've been testing on either full 7-series graphs or a titan arch.

litghost · 2020-07-07T17:15:29Z

@acomodi mentioned this, but if the ASSERT level is increased from 2 to 3 the error changes to:

vtr-verilog-to-routing/vpr/src/place/place.cpp:1963 comp_td_connection_cost: Assertion 'ipin > 0' failed (Shouldn't be calculating connection timing cost for driver pins).

acomodi · 2020-07-07T17:20:38Z

An additional note, the function that provides the driver pin, which causes this assertion error is the following:

vtr-verilog-to-routing/vpr/src/place/place.cpp

Lines 1862 to 1923 in 6af4a38

    
           static void update_td_costs(const PlaceDelayModel* delay_model, const PlacerCriticalities& place_crit, double* timing_cost) { 
        
               /* NB:  We must be careful calculating the total timing cost incrementally, 
        
                *      due to limitd floating point precision, so that we get a 
        
                *      bit-identical result matching that calculated by comp_td_costs(). 
        
                * 
        
                *      In particular, we can not simply calculate the incremental 
        
                *      delta's caused by changed connection timing costs and adjust 
        
                *      the timing cost. Due to limited precision, the results of  
        
                *      floating point math operations are order dependant and we 
        
                *      would get a different result. 
        
                * 
        
                *      To get around this, we calculate the timing costs hierarchically 
        
                *      to ensures we calculate the sum with the same order of operations  
        
                *      as comp_td_costs(). 
        
                * 
        
                *      See PlacerTimingCosts object used to represent connection_timing_costs 
        
                *      for details. 
        
                */ 
        
               vtr::Timer t; 
        
               auto& cluster_ctx = g_vpr_ctx.clustering(); 
        
               auto& clb_nlist = cluster_ctx.clb_nlist; 
        
               //Update the modified pin timing costs 
        
               { 
        
                   vtr::Timer timer; 
        
                   auto clb_pins_modified = place_crit.pins_with_modified_criticality(); 
        
                   for (ClusterPinId clb_pin : clb_pins_modified) { 
        
                       if (clb_nlist.pin_type(clb_pin) == PinType::DRIVER) continue; 
        
                       ClusterNetId clb_net = clb_nlist.pin_net(clb_pin); 
        
                       VTR_ASSERT_SAFE(clb_net); 
        
                       if (cluster_ctx.clb_nlist.net_is_ignored(clb_net)) continue; 
        
                       int ipin = clb_nlist.pin_net_index(clb_pin); 
        
                       VTR_ASSERT_SAFE(ipin >= 0 && ipin < int(clb_nlist.net_pins(clb_net).size())); 
        
                       double new_timing_cost = comp_td_connection_cost(delay_model, place_crit, clb_net, ipin); 
        
                       //Record new value 
        
                       connection_timing_cost[clb_net][ipin] = new_timing_cost; 
        
                   } 
        
                   f_update_td_costs_connections_elapsed_sec += timer.elapsed_sec(); 
        
               } 
        
               //Re-total timing costs of all nets 
        
               { 
        
                   vtr::Timer timer; 
        
                   *timing_cost = connection_timing_cost.total_cost(); 
        
                   f_update_td_costs_sum_nets_elapsed_sec += timer.elapsed_sec(); 
        
               } 
        
           #ifdef VTR_ASSERT_DEBUG_ENABLED 
        
               double check_timing_cost = 0.; 
        
               comp_td_costs(delay_model, place_crit, &check_timing_cost); 
        
               VTR_ASSERT_DEBUG_MSG(check_timing_cost == *timing_cost, 
        
                                    "Total timing cost calculated incrementally in update_td_costs() is " 
        
                                    "not consistent with value calculated from scratch in comp_td_costs()"); 
        
           #endif 
        
               f_update_td_costs_total_elapsed_sec += t.elapsed_sec(); 
        
           }

HackerFoo · 2020-07-07T18:48:46Z

@litghost I was using the full 50t graph. I assumed that the problem was triggered by my new placement code.

litghost · 2020-07-07T18:51:31Z

@litghost I was using the full 50t graph. I assumed that the problem was triggered by my new placement code.

No I don't believe your new placement code along trigger this issue. However in my limited testing, the full 50T graph did not produce this error, so it possible that ROI or full 50T + new placement triggers the issues. There is some kind of latent issue being uncovered here.

litghost · 2020-07-08T22:43:01Z

I did a quick test with HackerFoo@7e35d15 and the counter test on the xc7 50T ROI arch was able to complete if assert level was 2. On assert level 3, I got:

vtr-verilog-to-routing/vpr/src/place/place.cpp:1966 comp_td_connection_cost: Assertion 'connection_delay[net][ipin] == comp_td_connection_delay(delay_model, net, ipin)' 
failed (Connection delays should already be updated).

WIll continue investigating this week.

litghost · 2020-07-09T00:07:16Z

@kmurray Why is the assertion at

vtr-verilog-to-routing/vpr/src/place/place.cpp

Line 1963 in 6af4a38

    
           VTR_ASSERT_SAFE_MSG(ipin > 0, "Shouldn't be calculating connection timing cost for driver pins");

valid? In the parent context DRIVER pin types are ignored, see here: https://github.com/verilog-to-routing/vtr-verilog-to-routing/blame/master/vpr/src/place/place.cpp#L1889

I'm not clear why ipin == 0 is an assertion violation?

litghost · 2020-07-09T00:28:05Z

I did a quick test with HackerFoo@7e35d15 and the counter test on the xc7 50T ROI arch was able to complete if assert level was 2. On assert level 3, I got:
vtr-verilog-to-routing/vpr/src/place/place.cpp:1966 comp_td_connection_cost: Assertion 'connection_delay[net][ipin] == comp_td_connection_delay(delay_model, net, ipin)' 
failed (Connection delays should already be updated).  
WIll continue investigating this week.

The assertion from here should probably be behind the incremental #ifdef.

vaughnbetz · 2020-07-09T18:14:19Z

The assumption that pin #0 on a net is the driver is deeply embedded in the netlist data structures. So the assert looks valid.
I'd run the sanitizers to check memory faults, since this is a strange failure. I agree that this assertion doesn't look like it should fire based on the code, so something strange is going on (memory corruption? bad data in the architecture file?).

vaughnbetz · 2020-07-09T18:17:36Z

@Bill-hbrhbr this may be of interest; hard to debug without being able to run Symbiflow though.

litghost · 2020-07-09T19:00:09Z

@vaughnbetz / @Bill-hbrhbr I've added instructions for replicating our failure. I can provide more blif's if that is useful, however the current one is just a single input / output.

vaughnbetz · 2020-07-09T19:09:52Z

@Bill-hbrhbr : Keith has provided instructions on how to reproduce at the top of this issue -- can you take a look?

Bill-hbrhbr · 2020-07-10T12:11:53Z

The inherent rule is that the criticalities, delays, and timing costs are only calculated for sink pins, not the driver pins. All the data structures (or at least for Kevin's incremental sta code) are based on this fact, including the PlacerTimingCosts's heap in timing_place.h

An additional note, the function that provides the driver pin, which causes this assertion error is the following:

vtr-verilog-to-routing/vpr/src/place/place.cpp

Lines 1862 to 1923 in 6af4a38

static void update_td_costs(const PlaceDelayModel* delay_model, const PlacerCriticalities& place_crit, double* timing_cost) {

/* NB: We must be careful calculating the total timing cost incrementally,

* due to limitd floating point precision, so that we get a

* bit-identical result matching that calculated by comp_td_costs().

*

* In particular, we can not simply calculate the incremental

* delta's caused by changed connection timing costs and adjust

* the timing cost. Due to limited precision, the results of

* floating point math operations are order dependant and we

* would get a different result.

*

* To get around this, we calculate the timing costs hierarchically

* to ensures we calculate the sum with the same order of operations

* as comp_td_costs().

*

* See PlacerTimingCosts object used to represent connection_timing_costs

* for details.

*/

vtr::Timer t;

auto& cluster_ctx = g_vpr_ctx.clustering();

auto& clb_nlist = cluster_ctx.clb_nlist;

//Update the modified pin timing costs

{

vtr::Timer timer;

auto clb_pins_modified = place_crit.pins_with_modified_criticality();

for (ClusterPinId clb_pin : clb_pins_modified) {

if (clb_nlist.pin_type(clb_pin) == PinType::DRIVER) continue;

ClusterNetId clb_net = clb_nlist.pin_net(clb_pin);

VTR_ASSERT_SAFE(clb_net);

if (cluster_ctx.clb_nlist.net_is_ignored(clb_net)) continue;

int ipin = clb_nlist.pin_net_index(clb_pin);

VTR_ASSERT_SAFE(ipin >= 0 && ipin < int(clb_nlist.net_pins(clb_net).size()));

double new_timing_cost = comp_td_connection_cost(delay_model, place_crit, clb_net, ipin);

//Record new value

connection_timing_cost[clb_net][ipin] = new_timing_cost;

}

f_update_td_costs_connections_elapsed_sec += timer.elapsed_sec();

}

//Re-total timing costs of all nets

{

vtr::Timer timer;

*timing_cost = connection_timing_cost.total_cost();

f_update_td_costs_sum_nets_elapsed_sec += timer.elapsed_sec();

}

#ifdef VTR_ASSERT_DEBUG_ENABLED

double check_timing_cost = 0.;

comp_td_costs(delay_model, place_crit, &check_timing_cost);

VTR_ASSERT_DEBUG_MSG(check_timing_cost == *timing_cost,

"Total timing cost calculated incrementally in update_td_costs() is "

"not consistent with value calculated from scratch in comp_td_costs()");

#endif

f_update_td_costs_total_elapsed_sec += t.elapsed_sec();

}

Indeed, the bug appeared since place_crit.pins_with_modified_criticality() provided a driver pin in the failed test case and had its timing cost computed, which messed up the heap data structure for calculating the total timing cost (sum of all sink pins' costs).

This effect can be generated by a lot of different test cases, and it also depends on how the make_range function orders the elements in the vtr_id_set structure (which in turn affects how the heap is updated).

This should not happen. I suspect that the cause is within the following function:

vtr-verilog-to-routing/vpr/src/place/timing_place.cpp

Lines 31 to 87 in 6af4a38

    
           void PlacerCriticalities::update_criticalities(const SetupTimingInfo* timing_info, float crit_exponent) { 
        
               /* Performs a 1-to-1 mapping from criticality to timing_place_crit_. 
        
                * For every pin on every net (or, equivalently, for every tedge ending 
        
                * in that pin), timing_place_crit_ = criticality^(criticality exponent) */ 
        
               //Determine what pins need updating 
        
               if (INCR_UPDATE_CRITICALITIES) { 
        
                   cluster_pins_with_modified_criticality_.clear(); 
        
                   if (crit_exponent != last_crit_exponent_) { 
        
                       //Criticality exponent changed, must re-calculate *all* criticalties 
        
                       auto pins = clb_nlist_.pins(); 
        
                       cluster_pins_with_modified_criticality_.insert(pins.begin(), pins.end()); 
        
                       //Record new criticality exponent 
        
                       last_crit_exponent_ = crit_exponent; 
        
                   } else { 
        
                       //Criticality exponent unchanged 
        
                       // 
        
                       //Collect the cluster pins which need to be updated based on the latest timing 
        
                       //analysis 
        
                       // 
        
                       //Note we use the set of pins reported by the *timing_info* as having modified 
        
                       //criticality, rather than those marked as modified by the timing analyzer. 
        
                       //Since timing_info uses shifted/relaxed criticality (which depends on max 
        
                       //required time and worst case slacks), additional nodes may be modified 
        
                       //when updating the atom pin criticalities. 
        
                       for (AtomPinId atom_pin : timing_info->pins_with_modified_setup_criticality()) { 
        
                           ClusterPinId clb_pin = pin_lookup_.connected_clb_pin(atom_pin); 
        
                           //Some atom pins correspond to connections which are completely 
        
                           //contained within a cluster, and hence have no corresponding 
        
                           //clustered pin. 
        
                           if (!clb_pin) continue; 
        
                           cluster_pins_with_modified_criticality_.insert(clb_pin); 
        
                       } 
        
                   } 
        
               } else { 
        
                   //Non-incremental: all pins and nets need updating 
        
                   auto pins = clb_nlist_.pins(); 
        
                   cluster_pins_with_modified_criticality_.insert(pins.begin(), pins.end()); 
        
               } 
        
               //Update the effected pins 
        
               for (ClusterPinId clb_pin : cluster_pins_with_modified_criticality_) { 
        
                   ClusterNetId clb_net = clb_nlist_.pin_net(clb_pin); 
        
                   int pin_index_in_net = clb_nlist_.pin_net_index(clb_pin); 
        
                   float clb_pin_crit = calculate_clb_net_pin_criticality(*timing_info, pin_lookup_, clb_pin); 
        
                   /* The placer likes a great deal of contrast between criticalities. 
        
                    * Since path criticality varies much more than timing, we "sharpen" timing 
        
                    * criticality by taking it to some power, crit_exponent (between 1 and 8 by default). */ 
        
                   timing_place_crit_[clb_net][pin_index_in_net] = pow(clb_pin_crit, crit_exponent); 
        
               } 
        
           }

Specifically, this line

cluster_pins_with_modified_criticality_.insert(pins.begin(), pins.end());

created the problem. Only sink pins should be added according to the logic of timing cost computation. I suppose the logic is the same for non-incremental timing cost computation, so both line #42 and line #72 need to be modified. Regarding how disabling the incremental timing will bypass this bug, I'm not so sure about the cause, so correct my reasonings above if they are wrong.

I implemented a fix in my local repo, which only adds sink pins to be computed for criticalities. It made the bug go away and still passed vtr_reg_strong tests. I'll open a PR upon the request of anyone in this thread.

vaughnbetz · 2020-07-10T15:04:58Z

Nice work finding this so quickly Bill!
I agree that code looks wrong. Perhaps this is also the underlying reason Kevin had to implement a more complex summation (tree summation) when he added incremental timing analysis, in order to get the timing costs close enough to pass the incremental vs. brute-force timing cost consistency check (i.e. perhaps the new summation order masked this bug for the test cases run).
Should definitely be fixed. Please do a QoR run with your fixed code and add the results (spreadsheets of QoR vs. master for vtr and titan designs). Assuming they look good, you should open a PR and fix this.

litghost · 2020-07-10T19:24:09Z

I'll open a PR upon the request of anyone in this thread.

If you have a fix, please create a PR as soon as possible. This bug is currently blocking downstream VTR + symbiflow integration, and it would be good to confirm if your fix works or not.

vaughnbetz · 2020-07-10T20:10:50Z

Whoops, fat finger. Bill has a fix.

Bill-hbrhbr · 2020-07-10T20:11:24Z

Yep working on it right now

litghost · 2020-07-10T21:35:28Z

@Bill-hbrhbr : We are still seeing:

/tmpfs/src/github/vtr-verilog-to-routing/vpr/src/place/place.cpp:1966 comp_td_connection_cost: Assertion 'connection_delay[net][ipin] == comp_td_connection_delay(delay_model, net, ipin)' failed (Connection delays should already be updated).

which I believe you will replicate if you increase the ASSERT level from 2 to 3.

litghost · 2020-07-10T21:37:30Z

Maybe there needs to be an additional check to the assert that "crit_exponent == last_crit_exponent_ && ..."?

Bill-hbrhbr · 2020-07-10T21:38:22Z

looking into it right now

litghost · 2020-07-10T21:39:50Z

The way I see it, the logic that you modified does a brute force retiming if crit_exponent changes. However, if the underlying result is not a function of the crit_exponent, (e.g. 0 ^ crit_exponent => 0), then the assertion might fail?

Bill-hbrhbr · 2020-07-10T22:35:27Z

@litghost I haven't been able to replicate this bug. Is this produced on a different set of benchmarks?

litghost · 2020-07-16T23:40:58Z

@Bill-hbrhbr I've been able to replicate the failing assertion issue, here are the latest replication instructions:

1.Get input test data:

wget https://storage.googleapis.com/debug_data/replicate2.tar.bz2
mkdir vpr_failing_test
pushd vpr_failing_test
tar -xf replicate2.tar.bz2
popd

Build Symbiflow VPR at 2d2c0b6

git clone https://github.com/litghost/vtr-verilog-to-routing.git
cd vtr-verilog-to-routing
git checkout -b test
git reset --hard 2d2c0b6490c7914383bf326bee973feac455269b
mkdir build
cd build
cmake -DVTR_ASSERT_LEVEL=3 ..
make -j

Run failing test case:

cd ../../vpr_failing_test
cp ../vtr-verilog-to-routing/vpr/vpr .
./run.sh

2d2c0b6 has your earlier fix, so if you change cmake -DVTR_ASSERT_LEVEL=3 .. to cmake -DVTR_ASSERT_LEVEL=2 .. it doesn't assert, and completes without error.

litghost · 2020-07-17T00:16:21Z

@vaughnbetz / @Bill-hbrhbr I believe this assertion is not in error, but I'm not totally sure. I'm debugging on my side, let me know if you find anything.

vaughnbetz · 2020-07-17T00:44:28Z

Responding to a prior comment: I don't think this assert has anything to do with crit_exponent. The assert checks that the delay of a connection stored in a certain matrix entry matches what we get if we ask for the delay of that connection right now. We're failing that test. It looks like the routine is called in two places: once by the regular (incremental) timing cost update routine, and once by the brute-force routine that walks through all the connections. Bill, if you set a breakpoint on the assert and trace back to which caller it was, which connection it was (inet, ipin) and how different the stored delay is versus the recomputed one that may shed some light on this. Is there anything strange about this connection?

litghost · 2020-07-17T00:49:41Z

Is there anything strange about this connection?

Nope! In fact this design is super simple, one inpad and one outpad connected. I working on adding diagnostics prints to see if can find the issue first.

vaughnbetz · 2020-07-17T00:54:30Z

Does the assert only occur on that design, or also on larger ones? Are both I/Os locked in place (nothing to move)? Maybe there's a corner case if we don't move anything.

Or, if it happens on larger designs too, it's not that ...

litghost · 2020-07-17T00:55:24Z

Does the assert only occur on that design, or also on larger ones? Are both I/Os locked in place (nothing to move)? Maybe there's a corner case if we don't move anything.

They are not locked. I've already added a print checking a move took place.

litghost · 2020-07-17T01:02:24Z

Looks like the bug is in driven_by_moved_block. I'll provide a print trace in a minute that points in that direction.

litghost · 2020-07-17T01:05:08Z

This design has 1 net, with two pins 1 (driver) and 0 (sink). You can see in the trace that the placer only moves one block, but the incremental logic sees that the sink has been affected, but still skips the update because it believes the driver has also moved. Not sure why just yet:

Moves per temperature: 2
blocks_affected.num_moved_blocks 1
Examining net 0
Recomputing ClusterNetId 0 ClusterPinId 0
Recomputing 0[1]
Commiting td cost
blocks_affected.num_moved_blocks 1
Examining net 0
Recomputing ClusterNetId 0 ClusterPinId 0
Recomputing 0[1]
Commiting td cost

---- ------ ------- ------- ---------- ---------- ------- ---------- -------- ------- ------- ------ -------- --------- ------
Tnum   Time       T Av Cost Av BB Cost Av TD Cost     CPD       sTNS     sWNS Ac Rate Std Dev  R lim Crit Exp Tot Moves  Alpha
      (sec)                                          (ns)       (ns)     (ns)                                                 
---- ------ ------- ------- ---------- ---------- ------- ---------- -------- ------- ------- ------ -------- --------- ------
blocks_affected.num_moved_blocks 1
Examining net 0
Recomputing ClusterNetId 0 ClusterPinId 0
Recomputing 0[1]
Commiting td cost
blocks_affected.num_moved_blocks 1
Examining net 0
Recomputing ClusterNetId 0 ClusterPinId 1
Driver also moved for 0?
Commiting td cost
   1    0.0 8.8e+00   1.370       0.11 1.2145e-09   1.180      -1.18   -1.180   1.000  0.1962  161.0     1.00         2  0.500
blocks_affected.num_moved_blocks 1
Examining net 0
Recomputing ClusterNetId 0 ClusterPinId 1
Driver also moved for 0?
Commiting td cost
blocks_affected.num_moved_blocks 1
Examining net 0
Recomputing ClusterNetId 0 ClusterPinId 1
Driver also moved for 0?
Commiting td cost
   2    0.0 4.4e+00   0.647       0.04 1.2145e-09   1.235      -1.23   -1.235   1.000  0.1128  161.0     1.00         4  0.500
blocks_affected.num_moved_blocks 1
Examining net 0
Recomputing ClusterNetId 0 ClusterPinId 1
Driver also moved for 0?
blocks_affected.num_moved_blocks 1
Examining net 0
Recomputing ClusterNetId 0 ClusterPinId 1
Driver also moved for 0?
   3    0.0 2.2e+00   1.000       0.02 1.2145e-09   1.235      -1.23   -1.235   0.000  0.0000  161.0     1.00         6  0.950
/usr/local/google/home/keithrothman/cat_x/vtr-verilog-to-routing/vpr/src/place/place.cpp:1987 comp_td_connection_cost: Assertion 'connection_delay[net][ipin] == delay' failed (Connection delays should already be updated, 0[1] 1.21452e-09 != 8.1553e-10).

litghost · 2020-07-17T01:06:20Z

It looks like on iteration 1, both the driver and the sink move, which makes sense. However in further iterations, only the sink moves, but the driven_by_moved_block still returns true, causing the timing update to be skipped.

litghost · 2020-07-17T15:30:39Z

The bug is in the "PinType" annotation in the clustered netlist. It looks like the driver pin is being marked as a sink.

litghost · 2020-07-17T19:04:19Z

I've added an assertion to confirm what I believe is the PinType bug:

      VTR_ASSERT_SAFE_MSG((pin_type == PinType::DRIVER && cluster_ctx.clb_nlist.net_driver(net) == pin) || (pin_type == PinType::SINK && cluster_ctx.clb_nlist.net_driver(net) != pin),
              "Net driver pin should be a DRIVER pin type, or not the driver!");

which does throw an assertion violation:

Moves per temperature: 2
blocks_affected.num_moved_blocks 1
Examining net 0
Recomputing ClusterNetId 0 ClusterBlockId out:do[0] ClusterPinId 0 PinType 1
Driver block 1
Block 0 moved
2. Recomputing net 0[1] block 0[0]
Computing delay 1[1] -> 0[0]
Commiting td cost
Driver block 1
Block 0 moved
blocks_affected.num_moved_blocks 1
Examining net 0
Recomputing ClusterNetId 0 ClusterBlockId out:do[0] ClusterPinId 0 PinType 1
Driver block 1
Block 0 moved
2. Recomputing net 0[1] block 0[0]
Computing delay 1[1] -> 0[0]
Commiting td cost
Driver block 1
Block 0 moved

---- ------ ------- ------- ---------- ---------- ------- ---------- -------- ------- ------- ------ -------- --------- ------
Tnum   Time       T Av Cost Av BB Cost Av TD Cost     CPD       sTNS     sWNS Ac Rate Std Dev  R lim Crit Exp Tot Moves  Alpha
      (sec)                                          (ns)       (ns)     (ns)                                                 
---- ------ ------- ------- ---------- ---------- ------- ---------- -------- ------- ------- ------ -------- --------- ------
blocks_affected.num_moved_blocks 1
Examining net 0
Recomputing ClusterNetId 0 ClusterBlockId out:do[0] ClusterPinId 0 PinType 1
Driver block 1
Block 0 moved
2. Recomputing net 0[1] block 0[0]
Computing delay 1[1] -> 0[0]
Commiting td cost
Driver block 1
Block 0 moved
blocks_affected.num_moved_blocks 1
Examining net 0
Recomputing ClusterNetId 0 ClusterBlockId di[1] ClusterPinId 1 PinType 1
vtr-verilog-to-routing/vpr/src/place/place.cpp:1625 update_td_delta_costs: Assertion '(pin_type == PinType::DRIVER && cluster_ctx.clb_nlist.net_driver(net) == pin) || (pin_type == PinType::SINK && cluster_ctx.clb_nlist.net_driver(net) != pin)' failed (Net driver pin should be a DRIVER pin type, or not the driver!).

litghost · 2020-07-17T19:47:36Z

I added the following sanity check to check_netlist, and it is failing:

diff --git a/vpr/src/base/check_netlist.cpp b/vpr/src/base/check_netlist.cpp
index e0344415e..f60af17a2 100644
--- a/vpr/src/base/check_netlist.cpp
+++ b/vpr/src/base/check_netlist.cpp
@@ -73,6 +73,45 @@ void check_netlist(int verbosity) {
         }
     }
 
+    // Verify clustered netlist relationships and assumptions.
+    for (ClusterNetId net_id : cluster_ctx.clb_nlist.nets()) {
+        ClusterPinId driver_pin = cluster_ctx.clb_nlist.net_driver(net_id);
+        for (ClusterPinId pin : cluster_ctx.clb_nlist.net_pins(net_id)) {
+            if (cluster_ctx.clb_nlist.pin_net(pin) != net_id) {
+                VPR_FATAL_ERROR(VPR_ERROR_OTHER,
+                        "Net %zu pin %zu are inconsistently mapped\n", size_t(net_id), size_t(pin));
+            }
+
+            PinType pin_type = cluster_ctx.clb_nlist.pin_type(pin);
+            int pin_net_index = cluster_ctx.clb_nlist.pin_net_index(pin);
+            if(pin == driver_pin) {
+                if(pin_type != PinType::DRIVER) {
+                    VPR_FATAL_ERROR(VPR_ERROR_OTHER,
+                            "Pin %zu for net %zu is the net driver, but pin_type(%d) != PinType::DRIVER",
+                            size_t(pin), size_t(net_id), pin_type);
+                }
+
+                if(pin_net_index != 0) {
+                    VPR_FATAL_ERROR(VPR_ERROR_OTHER,
+                            "Pin %zu for net %zu is the net driver, but pin_net_index(%d) != 0",
+                            size_t(pin), size_t(net_id), pin_net_index);
+                }
+            } else {
+                if(pin_type != PinType::SINK) {
+                    VPR_FATAL_ERROR(VPR_ERROR_OTHER,
+                            "Pin %zu for net %zu is a net sink, but pin_type(%d) != PinType::SINK",
+                            size_t(pin), size_t(net_id), pin_type);
+                }
+
+                if(pin_net_index == 0) {
+                    VPR_FATAL_ERROR(VPR_ERROR_OTHER,
+                            "Pin %zu for net %zu is the net sink, but pin_net_index(%d) == 0",
+                            size_t(pin), size_t(net_id), pin_net_index);
+                }
+            }
+        }
+    }
+
     error += check_for_duplicated_names();
 
     if (error != 0) {

If that check looks good, we should probably merged that, and I'll create a PR.

litghost · 2020-07-17T20:28:36Z

I add a call to Netlist::verify and got a similar assertion violation:

diff --git a/vpr/src/base/check_netlist.cpp b/vpr/src/base/check_netlist.cpp
index e0344415e..ac2b7a963 100644
--- a/vpr/src/base/check_netlist.cpp
+++ b/vpr/src/base/check_netlist.cpp
@@ -39,6 +39,7 @@ void check_netlist(int verbosity) {
 
     /* This routine checks that the netlist makes sense. */
     auto& cluster_ctx = g_vpr_ctx.mutable_clustering();
+    cluster_ctx.clb_nlist.verify();

Error 1: 
Type: Unrecognized Error
File: vpr/src/base/netlist.tpp
Line: 1699
Message: Driver pin not found at expected index in net
The entire flow of VPR took 0.41 seconds (max_rss 67.2 MiB)

I think this confirms my earlier investigation. Now to determine where the problem is coming from.

litghost · 2020-07-17T21:02:14Z

Something weird is going on:

Detected 0 constant generators (to see names run with higher pack verbosity)
Adding pin 0 to net 0 that is pin_type 0
Adding pin 1 to net 0 that is pin_type 1
Finished loading packed FPGA netlist file (took 0.017435 seconds).
# Load Packing took 0.02 seconds (max_rss 67.6 MiB, delta_rss +36.3 MiB)
Warning 14: Netlist contains 0 global net to non-global architecture pin connections
Warning 15: Logic block #1 (di) has only 1 input pin 'di.inpad[0]' -- the whole block is hanging logic that should be swept.
Error 1: 
Type: Other
File: /usr/local/google/home/keithrothman/cat_x/vtr-verilog-to-routing/vpr/src/base/check_netlist.cpp
Line: 92
Message: Pin 0 for net 0 is the net driver, but pin_type(1) != PinType::DRIVER
The entire flow of VPR took 0.23 seconds (max_rss 67.8 MiB)

So pin 0 is added to net 0, and is pin type = 0 (DRIVER). Then a little while later, pin 0 is pin type 1? Still investigating.

litghost · 2020-07-17T21:33:49Z

I've located the bug, making a fix now.

litghost · 2020-07-17T21:53:16Z

PR opened for fix. I still need to make a replicating regression test, which I'll start on now.

vaughnbetz · 2020-07-18T01:35:36Z

Nice bug hunting on this Keith! You probably already thought of this, but just in case: you can turn on sanitizers in the makefile if you suspect memory corruption.

github-actions · 2025-05-09T13:28:03Z

This issue has been inactive for a year and has been marked as stale. It will be closed in 15 days if it continues to be stale. If you believe this is still an issue, please add a comment.

github-actions · 2025-05-25T13:23:30Z

This issue has been marked stale for 15 days and has been automatically closed.

vaughnbetz · 2025-05-25T23:08:15Z

Was fixed.

vaughnbetz closed this as completed Jul 10, 2020

vaughnbetz reopened this Jul 10, 2020

Bill-hbrhbr mentioned this issue Jul 10, 2020

Fix timing cost consistency check bug #1417

Merged

7 tasks

litghost mentioned this issue Jul 17, 2020

Timing graph node remap fix SymbiFlow/vtr-verilog-to-routing#539

Closed

litghost mentioned this issue Jul 17, 2020

Fix port processing in read netlist to handle cases where num_in_port = 0 #1436

Merged

9 tasks

github-actions bot added the Stale label May 9, 2025

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 25, 2025

vaughnbetz reopened this May 25, 2025

vaughnbetz closed this as completed May 25, 2025

Placer reporting cost errors during consistency checks #1402

Placer reporting cost errors during consistency checks #1402

Comments

litghost commented Jul 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce

Context

Your Environment

litghost commented Jul 6, 2020

Uh oh!

litghost commented Jul 6, 2020

Uh oh!

HackerFoo commented Jul 7, 2020

Uh oh!

litghost commented Jul 7, 2020

Uh oh!

litghost commented Jul 7, 2020

Uh oh!

acomodi commented Jul 7, 2020

Uh oh!

HackerFoo commented Jul 7, 2020

Uh oh!

litghost commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

litghost commented Jul 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

litghost commented Jul 9, 2020

Uh oh!

litghost commented Jul 9, 2020

Uh oh!

vaughnbetz commented Jul 9, 2020

Uh oh!

vaughnbetz commented Jul 9, 2020

Uh oh!

litghost commented Jul 9, 2020

Uh oh!

vaughnbetz commented Jul 9, 2020

Uh oh!

Bill-hbrhbr commented Jul 10, 2020

Uh oh!

vaughnbetz commented Jul 10, 2020

Uh oh!

litghost commented Jul 10, 2020

Uh oh!

vaughnbetz commented Jul 10, 2020

Uh oh!

Bill-hbrhbr commented Jul 10, 2020

Uh oh!

litghost commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

litghost commented Jul 10, 2020

Uh oh!

Bill-hbrhbr commented Jul 10, 2020

Uh oh!

litghost commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bill-hbrhbr commented Jul 10, 2020

Uh oh!

litghost commented Jul 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

litghost commented Jul 17, 2020

Uh oh!

vaughnbetz commented Jul 17, 2020

Uh oh!

litghost commented Jul 17, 2020

Uh oh!

vaughnbetz commented Jul 17, 2020

Uh oh!

litghost commented Jul 17, 2020

Uh oh!

litghost commented Jul 17, 2020

Uh oh!

litghost commented Jul 6, 2020 •

edited

Loading

litghost commented Jul 7, 2020 •

edited

Loading

litghost commented Jul 8, 2020 •

edited

Loading

litghost commented Jul 10, 2020 •

edited

Loading

litghost commented Jul 10, 2020 •

edited

Loading

litghost commented Jul 16, 2020 •

edited

Loading

litghost commented Jul 17, 2020 •

edited

Loading

litghost commented Jul 17, 2020 •

edited

Loading