Add raw setup slack analysis to placement quench #1501

Bill-hbrhbr · 2020-08-24T22:26:39Z

Related Issue

simplified version of #1450 for easier code review and merging.

Copy of the note in the original PR

Hi @HackerFoo. Please checkout this draft PR to see if it's easier for you to review.
New features:

try_swap() routine along with analyze_setup_slack_cost() routine.
The new PlacerSetupSlacks class in timing_place.h/cpp
Three new timing update related routines: update_timing_classes(), update_timing_cost(), perform_full_timing_update() in place.cpp.
VPR option to turn on setup slack analysis during placement quench: --place_quench_metric {auto, timing_cost, setup_slack}

My steps for this PR:

PlacerSetupSlacks is an imitation of the class PlacerCriticalities, and is utilized to get setup slacks from Kevin's Tatum timing analyzer.
I changed the original recompute_criticalities() to update_setup_slacks_and_criticalities() so that both categories of values can be updated.
One problem now occurs. Originally, each time the timing graph is updated by calling timing_info->update(), the criticalities are updated as well. These updates use incremental techniques so that they only checkout modified atom pins returned by the timing analyzer. However, now the timing graph might be updated only for the setup slacks (without having criticalities updated). On the next iteration of criticalities update, it cannot use the incremental technique anymore, since more pins will have been modified than the ones the timing analyzer currently returns. In this case, the criticalities have to be recomputed from scratch (going through each sink pin).
To resolve this issue, I created the t_timing_update_mode class to keep track of the slack/criticality update status. No documentation on this structure at this commit.
Now with the timing stuff down, I need to implement the setup slack analysis during the placement quench stage. So I made a vpr option --place_quench_metric to ask the user whether or not to turn on the new analysis technique in the quench.
I modified the try_swap() routine so that, if using the slack analysis, it will do a timing_update to get slacks, analyze, and then decide whether or not to accept the move based on the slack improvement. If accepted, the timing update will be kept; if rejected, I revert the timing update so that the slacks in the timing graph returns to their original values.
Note that these timing updates will not involve changing criticalities (hence I had the consideration for point ODIN II: dual_port_ram memory depth not bounded #3 and Area calculation error in routing_stats() #4), and I've made a lot of testings to make sure that the timing update reversion works.

…ored PlacerCriticalities, and created PlacerSetupSlacks, so that they can choose between doing incremental V.S. from scratch updates.

…pdate. Added checks to see if the updates need to be done from scratch or can be done incrementally

…cks. The matrix update is incremental according to the pins with modified setup slacks returned from PlacerSetupSlacks. Outer loop routine now updates both setup slacks and criticalities, while the inner loop routine passes in variables that determine the strategies/cost functions used to evaluate the effectiveness of try_swap moves.

…a series of successful moves done by try_swap. Right now the data structures representing the state variables are directly being copied, however the process can possibly be optimized with incremental techniques. The snapshot routines are called in the placement's inner loop, and should be used together with VPR options quench_recompute_divider and less optimally inner_recompute_divider. The latter would be too time consuming in practice.

script)

…ng placement snapshots). Currently experiencing consistency failures. Also updated slack analysis cost function: comparing the worse slack change across all modified clb pins

…ing the placement quench stage. Made commit_td_cost method incremental by only going through sink pins affected by the moved blocks.

…o a new local structure called t_placer_timing_update_mode to tidy up the code.

…lysis during placement quench. Possible options are: auto, timing_cost, setup_slack.

vpr/src/timing/timing_util.cpp

vpr/src/place/place.cpp

HackerFoo · 2020-08-27T20:56:56Z

vpr/src/place/place.cpp

+    std::sort(proposed_setup_slacks.begin(), proposed_setup_slacks.end());
+
+    //Check the first pair of slack values that are different
+    //If found, return their difference


What is the justification for this? Why not the difference between the sum of the original and proposed slacks? This needs explanation.

Added function level description.

We only check if the worst value among all the values modified has gotten better or worth, since we only care about the critical path.

Of course, this is a very simple cost formulation. If you have a better idea, I'll be happy to implement it.

vpr/src/place/place.cpp

…mmand-line options.

… PlacerCriticalities and cleaned up related code in placer routines. Enchanced documentation requested by PR comments.

Bill-hbrhbr

Hi @HackerFoo and @vaughnbetz, first of all, thank you for your detailed and valuable review! I've implemented what you've suggested and highlighted crucial routines in this self-review. Please go over these functions, especially their documentation.
After I gather your approvals as well as satisfactory benchmark results, I will merge this PR into the master and start adding more documentation and refactoring place.cpp.

Edit: will also add a reg test.

Bill-hbrhbr · 2020-08-28T08:44:35Z

vpr/src/place/place.cpp

+/**
+ * @brief Update timing information based on the current block positions.
+ *
+ * Run STA to update the timing info class.
+ *
+ * Update the values stored in PlacerCriticalities and PlacerSetupSlacks
+ * if they are enabled to update. To enable updating, call their respective
+ * enable_update() method. See their documentation for more detailed info.
+ *
+ * If criticalities are updated, the timing driven costs should be updated
+ * as well by calling update_timing_cost(). Calling this routine to update
+ * timing_cost will produce round-off error in the long run due to its
+ * incremental nature, so the timing cost value will be recomputed once in
+ * a while, via other timing driven routines.
+ *
+ * If setup slacks are updated, then normally they should be committed to
+ * `connection_setup_slack` via commit_setup_slacks() routine. However,
+ * sometimes new setup slack values are not committed immediately if we
+ * expect to revert the current timing update in the near future, or if
+ * we wish to compare the new slack values to the original ones.
+ *
+ * All the pins with changed connection delays have already been added into
+ * the ClusteredPinTimingInvalidator to allow incremental STA update. These
+ * changed connection delays are a direct result of moved blocks in try_swap().
+ */
+static void update_timing_classes(float crit_exponent,
+                                  SetupTimingInfo* timing_info,
+                                  PlacerCriticalities* criticalities,
+                                  PlacerSetupSlacks* setup_slacks,
+                                  ClusteredPinTimingInvalidator* pin_timing_invalidator) {
+    /* Run STA to update slacks and adjusted/relaxed criticalities. */
    timing_info->update();

-    //Update placer'criticalities (e.g. sharpen with crit_exponent)
+    /* Update the placer's criticalities (e.g. sharpen with crit_exponent). */
    criticalities->update_criticalities(timing_info, crit_exponent);

-    //Update connection, net and total timing costs based on new criticalities
+    /* Update the placer's raw setup slacks. */
+    setup_slacks->update_setup_slacks(timing_info);
+
+    /* Clear invalidation state. */
+    pin_timing_invalidator->reset();
+}