-
Notifications
You must be signed in to change notification settings - Fork 415
Add raw setup slack analysis to placement quench #1501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add raw setup slack analysis to placement quench #1501
Conversation
…ored PlacerCriticalities, and created PlacerSetupSlacks, so that they can choose between doing incremental V.S. from scratch updates.
…pdate. Added checks to see if the updates need to be done from scratch or can be done incrementally
…cks. The matrix update is incremental according to the pins with modified setup slacks returned from PlacerSetupSlacks. Outer loop routine now updates both setup slacks and criticalities, while the inner loop routine passes in variables that determine the strategies/cost functions used to evaluate the effectiveness of try_swap moves.
…a series of successful moves done by try_swap. Right now the data structures representing the state variables are directly being copied, however the process can possibly be optimized with incremental techniques. The snapshot routines are called in the placement's inner loop, and should be used together with VPR options quench_recompute_divider and less optimally inner_recompute_divider. The latter would be too time consuming in practice.
…ng placement snapshots). Currently experiencing consistency failures. Also updated slack analysis cost function: comparing the worse slack change across all modified clb pins
…ing the placement quench stage. Made commit_td_cost method incremental by only going through sink pins affected by the moved blocks.
…o a new local structure called t_placer_timing_update_mode to tidy up the code.
…lysis during placement quench. Possible options are: auto, timing_cost, setup_slack.
std::sort(proposed_setup_slacks.begin(), proposed_setup_slacks.end()); | ||
|
||
//Check the first pair of slack values that are different | ||
//If found, return their difference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the justification for this? Why not the difference between the sum of the original and proposed slacks? This needs explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added function level description.
We only check if the worst value among all the values modified has gotten better or worth, since we only care about the critical path.
Of course, this is a very simple cost formulation. If you have a better idea, I'll be happy to implement it.
…mmand-line options.
… PlacerCriticalities and cleaned up related code in placer routines. Enchanced documentation requested by PR comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @HackerFoo and @vaughnbetz, first of all, thank you for your detailed and valuable review! I've implemented what you've suggested and highlighted crucial routines in this self-review. Please go over these functions, especially their documentation.
After I gather your approvals as well as satisfactory benchmark results, I will merge this PR into the master and start adding more documentation and refactoring place.cpp.
Edit: will also add a reg test.
/** | ||
* @brief Update timing information based on the current block positions. | ||
* | ||
* Run STA to update the timing info class. | ||
* | ||
* Update the values stored in PlacerCriticalities and PlacerSetupSlacks | ||
* if they are enabled to update. To enable updating, call their respective | ||
* enable_update() method. See their documentation for more detailed info. | ||
* | ||
* If criticalities are updated, the timing driven costs should be updated | ||
* as well by calling update_timing_cost(). Calling this routine to update | ||
* timing_cost will produce round-off error in the long run due to its | ||
* incremental nature, so the timing cost value will be recomputed once in | ||
* a while, via other timing driven routines. | ||
* | ||
* If setup slacks are updated, then normally they should be committed to | ||
* `connection_setup_slack` via commit_setup_slacks() routine. However, | ||
* sometimes new setup slack values are not committed immediately if we | ||
* expect to revert the current timing update in the near future, or if | ||
* we wish to compare the new slack values to the original ones. | ||
* | ||
* All the pins with changed connection delays have already been added into | ||
* the ClusteredPinTimingInvalidator to allow incremental STA update. These | ||
* changed connection delays are a direct result of moved blocks in try_swap(). | ||
*/ | ||
static void update_timing_classes(float crit_exponent, | ||
SetupTimingInfo* timing_info, | ||
PlacerCriticalities* criticalities, | ||
PlacerSetupSlacks* setup_slacks, | ||
ClusteredPinTimingInvalidator* pin_timing_invalidator) { | ||
/* Run STA to update slacks and adjusted/relaxed criticalities. */ | ||
timing_info->update(); | ||
|
||
//Update placer'criticalities (e.g. sharpen with crit_exponent) | ||
/* Update the placer's criticalities (e.g. sharpen with crit_exponent). */ | ||
criticalities->update_criticalities(timing_info, crit_exponent); | ||
|
||
//Update connection, net and total timing costs based on new criticalities | ||
/* Update the placer's raw setup slacks. */ | ||
setup_slacks->update_setup_slacks(timing_info); | ||
|
||
/* Clear invalidation state. */ | ||
pin_timing_invalidator->reset(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important documentation for review
/** | ||
* @brief Updates every timing related classes, variables and structures. | ||
* | ||
* This routine exists to reduce code duplication, as the placer routines | ||
* often require updating every timing related stuff. | ||
* | ||
* Updates: SetupTimingInfo, PlacerCriticalities, PlacerSetupSlacks, | ||
* timing_cost, connection_setup_slack. | ||
*/ | ||
static void perform_full_timing_update(float crit_exponent, | ||
const PlaceDelayModel* delay_model, | ||
PlacerCriticalities* criticalities, | ||
PlacerSetupSlacks* setup_slacks, | ||
ClusteredPinTimingInvalidator* pin_timing_invalidator, | ||
SetupTimingInfo* timing_info, | ||
t_placer_costs* costs) { | ||
/* Update all timing related classes. */ | ||
criticalities->enable_update(); | ||
setup_slacks->enable_update(); | ||
update_timing_classes(crit_exponent, | ||
timing_info, | ||
criticalities, | ||
setup_slacks, | ||
pin_timing_invalidator); | ||
|
||
/* Update the timing cost with new connection criticalities. */ | ||
update_timing_cost(delay_model, | ||
criticalities, | ||
&costs->timing_cost); | ||
|
||
/* Commit the setup slacks since they are updated. */ | ||
commit_setup_slacks(setup_slacks); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important documentation for review
vpr/src/place/place.cpp
Outdated
/** | ||
* @brief Check if the setup slack has gotten better or worse due to block swap. | ||
* | ||
* Get all the modified slack values via the PlacerSetupSlacks class, and compare | ||
* then with the original values at these connections. Sort them and compare them | ||
* one by one, and return the difference of the first different pair. | ||
* | ||
* If the new slack value is larger(better), than return a negative value so that | ||
* the move will be accepted. If the new slack value is smaller(worse), return a | ||
* positive value so that the move will be rejected. | ||
* | ||
* If no slack values have changed, then return an arbitrary positive number. A | ||
* move resulting in no change in the slack values should probably be unnecessary. | ||
* | ||
* The sorting is need to prevent in the unlikely circumstances that a bad slack | ||
* value suddenly got very good due to the block move, while a good slack value | ||
* got very bad, perhaps even worse than the original worse slack value. | ||
*/ | ||
static float analyze_setup_slack_cost(const PlacerSetupSlacks* setup_slacks) { | ||
const auto& cluster_ctx = g_vpr_ctx.clustering(); | ||
const auto& clb_nlist = cluster_ctx.clb_nlist; | ||
|
||
//Find the original/proposed setup slacks of pins with modified values | ||
std::vector<float> original_setup_slacks, proposed_setup_slacks; | ||
|
||
auto clb_pins_modified = setup_slacks->pins_with_modified_setup_slack(); | ||
for (ClusterPinId clb_pin : clb_pins_modified) { | ||
ClusterNetId net_id = clb_nlist.pin_net(clb_pin); | ||
size_t ipin = clb_nlist.pin_net_index(clb_pin); | ||
|
||
original_setup_slacks.push_back(connection_setup_slack[net_id][ipin]); | ||
proposed_setup_slacks.push_back(setup_slacks->setup_slack(net_id, ipin)); | ||
} | ||
|
||
//Sort in ascending order, from the worse slack value to the best | ||
std::sort(original_setup_slacks.begin(), original_setup_slacks.end()); | ||
std::sort(proposed_setup_slacks.begin(), proposed_setup_slacks.end()); | ||
|
||
//Check the first pair of slack values that are different | ||
//If found, return their difference | ||
for (size_t idiff = 0; idiff < original_setup_slacks.size(); ++idiff) { | ||
float slack_diff = original_setup_slacks[idiff] != proposed_setup_slacks[idiff]; | ||
|
||
if (slack_diff != 0) { | ||
return slack_diff; | ||
} | ||
} | ||
|
||
//If all slack values are identical (or no modified slack values), | ||
//reject this move by returning an arbitrary positive number as cost. | ||
return 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important documentation for review
vpr/src/place/place.cpp
Outdated
/** | ||
* @brief Commit all the setup slack values from the PlacerSetupSlacks | ||
* class to `connection_setup_slack`. | ||
* | ||
* This routine is incremental since it relies on the pins_with_modified_setup_slack() | ||
* to detect which pins need to be updated and which pins do not. | ||
* | ||
* Therefore, it is assumed that this routine is always called immediately after | ||
* each time update_timing_classes() is called with setup slack update enabled. | ||
* Otherwise, pins_with_modified_setup_slack() cannot accurately account for all | ||
* the pins that have their setup slacks changed, making this routine incorrect. | ||
* | ||
* Currently, the only exception to the rule above is when setup slack analysis is used | ||
* during the placement quench. The new setup slacks might be either accepted or | ||
* rejected, so for efficiency reasons, this routine is not called if the slacks are | ||
* rejected in the end. For more detailed info, see the try_swap() routine. | ||
*/ | ||
static void commit_setup_slacks(const PlacerSetupSlacks* setup_slacks) { | ||
const auto& clb_nlist = g_vpr_ctx.clustering().clb_nlist; | ||
|
||
//Incremental: only go through sink pins with modified setup slack | ||
auto clb_pins_modified = setup_slacks->pins_with_modified_setup_slack(); | ||
for (ClusterPinId pin_id : clb_pins_modified) { | ||
ClusterNetId net_id = clb_nlist.pin_net(pin_id); | ||
size_t pin_index_in_net = clb_nlist.pin_net_index(pin_id); | ||
|
||
connection_setup_slack[net_id][pin_index_in_net] = setup_slacks->setup_slack(net_id, pin_index_in_net); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important documentation for review
/** | ||
* @brief Verify that the values in `connection_setup_slack` matches PlacerSetupSlacks. | ||
* | ||
* Return true if all connection values are identical. Otherwise, return false. | ||
* | ||
* Currently, this routine is called to check if the timing update has been successfully | ||
* reverted after a proposed move is rejected when applying setup slack analysis during | ||
* the placement quench. If successful, the setup slacks in PlacerSetupSlacks should be | ||
* the same as the values in `connection_setup_slack` without running commit_setup_slacks(). | ||
* For more detailed info, see the try_swap() routine. | ||
*/ | ||
static bool verify_connection_setup_slacks(const PlacerSetupSlacks* setup_slacks) { | ||
const auto& clb_nlist = g_vpr_ctx.clustering().clb_nlist; | ||
|
||
//Go through every single sink pin to check that the slack values are the same | ||
for (ClusterNetId net_id : clb_nlist.nets()) { | ||
for (size_t ipin = 1; ipin < clb_nlist.net_pins(net_id).size(); ++ipin) { | ||
if (connection_setup_slack[net_id][ipin] != setup_slacks->setup_slack(net_id, ipin)) { | ||
return false; | ||
} | ||
} | ||
} | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important documentation for review
vpr/src/place/timing_place.h
Outdated
/** | ||
* @brief PlacerSetupSlacks returns the RAW setup slacks of clustered netlist connection. | ||
* | ||
* Usage | ||
* ===== | ||
* PlacerTimingCosts mimics a 2D array of connection timing costs running from: | ||
* [0..cluster_ctx.clb_nlist.nets().size()-1][1..num_pins-1] | ||
* This class mirrors PlacerCriticalities by both its methods and its members. The only | ||
* difference is that this class deals with RAW setup slacks returned by SetupTimingInfo | ||
* rather than criticalities. See the documentation on PlacerCriticalities for more. | ||
* | ||
* RAW setup slacks are unlike criticalities. Their values are not confined between | ||
* 0 and 1. Their values can be either positive or negative. | ||
* | ||
* This class also provides iterating over the clustered netlist connections/pins that | ||
* have modified setup slacks by the last call to update_setup_slacks(). However, this | ||
* utility is mainly used for incrementally committing the setup slack values into the | ||
* structure `connection_setup_slack` used by many placer routines. | ||
*/ | ||
class PlacerSetupSlacks { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important documentation for review
vpr/src/place/timing_place.h
Outdated
|
||
/** | ||
* @brief PlacerCriticalities returns the clustered netlist connection criticalities | ||
* used by the placer ('sharpened' by a criticality exponent). | ||
* | ||
* Usage | ||
* ===== | ||
* PlacerCriticalities returns the clustered netlist connection criticalities used by | ||
* the placer ('sharpened' by a criticality exponent). This also serves to map atom | ||
* netlist level criticalites (i.e. on AtomPinIds) to the clustered netlist (i.e. | ||
* ClusterPinIds) used during placement. | ||
* This class also serves to map atom netlist level criticalites (i.e. on AtomPinIds) | ||
* to the clustered netlist (i.e. ClusterPinIds) used during placement. | ||
* | ||
* Criticalities are calculated by calling update_criticalities(), which will | ||
* update criticalities based on the atom netlist connection criticalities provided by | ||
* the passed in SetupTimingInfo. This is done incrementally, based on the modified | ||
* connections/AtomPinIds returned by SetupTimingInfo. | ||
* Criticalities are updated by update_criticalities(), given that `update_enabled` is | ||
* set to true. It will update criticalities based on the atom netlist connection | ||
* criticalities provided by the passed in SetupTimingInfo. | ||
* | ||
* The criticalities of individual connections can then be queried by calling the | ||
* criticality() member function. | ||
* This process can be done incrementally, based on the modified connections/AtomPinIds | ||
* returned by SetupTimingInfo. However, the set returned only reflects the connections | ||
* changed by the last call to the timing info update. | ||
* | ||
* Therefore, if SetupTimingInfo is updated twice in succession without criticalities | ||
* getting updated (update_enabled = false), the returned set cannot account for all | ||
* the connections that have been modified, in which case a recomputation is required. | ||
* Hence, each time update_setup_slacks_and_criticalities() is called, we assign | ||
* `recompute_required` the opposite value of `update_enabled`. | ||
* | ||
* It also supports iterating via pins_with_modified_criticalities() through the | ||
* clustered netlist pins/connections which have had their criticality modified by | ||
* the last call to update_criticalities(), which is useful for incrementally | ||
* re-calculating timing costs. | ||
* This class also maps/transforms the modified atom connections/pins returned by the | ||
* timing info into modified clustered netlist connections/pins after calling | ||
* update_criticalities(). The interface then enables users to iterate over this range | ||
* via pins_with_modified_criticalities(). This is useful for incrementally re-calculating | ||
* the timing costs. | ||
* | ||
* The criticalities of individual connections can then be queried by calling the | ||
* criticality() member function. | ||
* | ||
* Implementation | ||
* ============== | ||
* To support incremental re-calculation the class saves the last criticality exponent | ||
* passed to update_criticalites(). If the next update uses the same exponent criticalities | ||
* can be incrementally updated. Otherwise they must be re-calculated from scratch, since | ||
* a change in exponent changes *all* criticalities. | ||
* To support incremental re-calculation, the class saves the last criticality exponent | ||
* passed to PlacerCriticalities::update_criticalites(). If the next update uses the same | ||
* exponent, criticalities can be incrementally updated. Otherwise, they must be re-calculated | ||
* from scratch, since a change in exponent changes *all* criticalities. | ||
*/ | ||
class PlacerCriticalities { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important documentation for review
/** | ||
* @brief Returns the raw setup slack of a net's pin in the CLB netlist. | ||
* Assumes that the timing graph is correct and up to date. | ||
*/ | ||
float calculate_clb_net_pin_setup_slack(const SetupTimingInfo& timing_info, const ClusteredPinAtomPinsLookup& pin_lookup, ClusterPinId clb_pin) { | ||
//There may be multiple atom netlist pins connected to this CLB pin | ||
float clb_pin_setup_slack = std::numeric_limits<float>::infinity(); | ||
for (const auto atom_pin : pin_lookup.connected_atom_pins(clb_pin)) { | ||
//Take the worst/minimum of the atom pin slack as the CLB pin slack | ||
clb_pin_setup_slack = std::min(clb_pin_setup_slack, timing_info.setup_pin_slack(atom_pin)); | ||
} | ||
|
||
return clb_pin_setup_slack; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important documentation for review
/** | ||
* @brief Update the timing driven (td) costs. | ||
* | ||
* This routine either uses incremental update_td_costs(), or updates | ||
* from scratch using comp_td_costs(). By default, it is incremental | ||
* by iterating over the set of clustered netlist connections/pins | ||
* returned by PlacerCriticalities::pins_with_modified_criticality(). | ||
* | ||
* Hence, this routine should always be called when PlacerCriticalites | ||
* is enabled to be updated in update_timing_classes(). Otherwise, the | ||
* incremental method will no longer be correct. | ||
*/ | ||
static void update_timing_cost(const PlaceDelayModel* delay_model, | ||
const PlacerCriticalities* criticalities, | ||
double* timing_cost) { | ||
#ifdef INCR_COMP_TD_COSTS | ||
update_td_costs(delay_model, *criticalities, &costs->timing_cost); | ||
update_td_costs(delay_model, *criticalities, timing_cost); | ||
#else | ||
comp_td_costs(delay_model, *criticalities, &costs->timing_cost); | ||
comp_td_costs(delay_model, *criticalities, timing_cost); | ||
#endif | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important doc for review
…lacement algorithm to be used during quench.
…thm class to efficiently include SLACK_TIMING_PLACE in all branchings related to CRITICALITY_TIMING_PLACE.
…LACK_TIMING_PLACE during placement quench would succeed.
* | ||
* Also supports assignments and comparisons between t_place_algorithm | ||
* and e_place_algorithm so as not to break down previous codes. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the justification for this over the following?
bool is_timing_driven(e_place_algorithm algo) {
return algo == CRITICALITY_TIMING_PLACE || algo == SLACK_TIMING_PLACE;
}
What methods do you propose should be added to this class? Do they improve the current code? If so, please add them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The timing driven check is performed in several places other than
place.cpp
. So I incorporated this check intoplacer_opts
. - There might be future categorizations of placement algorithms/strategies. Currently, there are only two categories (timing and no timing). Having this class will make it easier to describe and implement new categorizations in the future. If you don't think this is necessary, I'll revert the change.
MoveGenerator& move_generator, | ||
t_pl_blocks_to_be_moved& blocks_affected, | ||
SetupTimingInfo* timing_info) { | ||
SetupTimingInfo* timing_info, | ||
const t_place_algorithm& place_algorithm) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overrides placer_opts
? It's not clear how this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
placement_inner_loop()
is called for both normal annealing and the quench. The normal annealing now uses placer_opts.place_algorithm
and the quench now uses placer_opts.place_quench_algorithm
. Instead of creating a new separate function, I have a new parameter place_algorithm
passed in to override placer_opts.place_algorithm
const PlacerCriticalities* criticalities, | ||
TimingInfo* timing_info, | ||
PlacerCriticalities* criticalities, | ||
PlacerSetupSlacks* setup_slacks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
criticalities
and setup_slacks
should be bundled together because they used together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to do this in another PR, where I will refactor place.cpp. But if you want it done here, I will implement it. On a side note, how would you prefer naming this new structure?
…ocumentation for options --place_algorithm and --place_quench_algorithm (same as the documentation on developer's page.
vpr/src/place/timing_place.h
Outdated
void update_criticalities(const SetupTimingInfo* timing_info, float criticality_exponent); | ||
|
||
//Override the criticality of a particular connection | ||
///@brief Override the criticality of a particular connection. | ||
void set_criticality(ClusterNetId net, int ipin, float val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd change val to crit_val, and ipin to inet_pin_index (or add a comment on what ipin is -- I think it is an index on that net from 0 to fanout).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added assertion level 3 check with error message for ipin in timing_place.cpp
.
Thanks Bill. The new code looks very clean and the commenting enhancements throughout are very clear. I have two more files to read (timing_place.cpp and place.cpp, so big ones). Will submit comments on those as soon as I get through them ... |
Hi @vaughnbetz, thank you for taking the time to look at the new changes. I will send you an e-mail soon regarding any leftover concerns for this PR. |
vpr/src/place/timing_place.cpp
Outdated
constexpr bool INCR_UPDATE_CRITICALITIES = true; | ||
|
||
/**************************************/ | ||
///@brief Use an incremental approach to updating criticalities and setup slacks? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain what this means. Does it control incremental update_timing calls, or recomputing only the criticalities and setup slacks for connections whose slack changed in the last timing analysis, or both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is safe to remove these flags. We can use incremental updates whenever it's possible now.
void PlacerCriticalities::incr_update_criticalities(const SetupTimingInfo* timing_info) { | ||
cluster_pins_with_modified_criticality_.clear(); | ||
|
||
for (AtomPinId atom_pin : timing_info->pins_with_modified_setup_criticality()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code and the code to find the pins_with_modified_setup_slack is almost the same, except one uses the timing_info->pins_with_modified_setup_criticality() and the other uses timing_info->pins_with_modified_setup_slack().
- Do the timing_info->pins_with_modified_setup_criticality() and pins_with_modified_setup_slack() differ? Why? I expect they are close to the same; could we just use whichever one is the bigger set (maybe criticality, due to slack shifting) to track both slack and criticality pins that need updates? Would simplify the code and likely make it faster if we are using both criticalities and slacks, if indeed the pin update lists are usually the same (less work and less memory footprint).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can just use the bigger of pins_with_modified_criticality_ or pins_with_modified_setup_slack, some data members and some duplicated code can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recommend doing this since they are pre-computed (in slack_evaluation.cpp
) and are not necessarily the same. The efforts that go into merging these two sets are not worth it in my opinion, and it might make the code less readable. I agree with your feeling that a lot of codes are duplicated, but I don't have a good way of resolving the issue without sacrificing code readability.
float rlim_escape_fraction, | ||
enum e_place_algorithm place_algorithm, | ||
const t_place_algorithm& place_algorithm, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that adding arguments to try_swap and the indirection (const ref) to place_algorithm has slowed it down some. I wouldn't expect much impact, but try_swap is called hundreds of millions of times for big circuits. So that's one commit to explore as you binary search for the slowdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The slowdown already occurred in the commit before I implemented this algorithm wrapper, so this is likely not the cause of the slowdown.
void recompute_setup_slacks(); | ||
|
||
///@brief Flag that turns on/off the update_setup_slacks() routine. | ||
bool update_enabled = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this get turned off by an explicit call in place.cpp if the place algorithm doesn't need setup slacks? If not, could be the cause of the CPU increase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I turned it off during my runtime experiments and it has shown that this isn't the cause of the runtime increase.
float rlim_escape_fraction, | ||
enum e_place_algorithm place_algorithm, | ||
const t_place_algorithm& place_algorithm, | ||
float timing_tradeoff) { | ||
/* Picks some block and moves it to another spot. If this spot is * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should go in front of the function header in the refactored code (you probably did that already).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do this in the subsequent PR
//Go through all the sink pins affected | ||
for (ClusterPinId pin_id : blocks_affected.affected_pins) { | ||
ClusterNetId net_id = clb_nlist.pin_net(pin_id); | ||
int ipin = clb_nlist.pin_net_index(pin_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the few code differences in the base (CRITICALITY_PLACE) algorithm in the new code vs. the old code. Code looks fine (cleaner), and affected_pins is pretty much the same pin list the other code would create dynamically. One difference: the old code did not update the timing cost if net_is_ignored(net_id) -- you could try putting that back in. Doesn't seem like it would make much difference but may be worth a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no difference due to the way affected_pins
are gathered.
This structure is populated in try_swap()
by calling find_affected_nets_and_update_costs()
->update_td_delta_costs()
.
The ignored nets are skipped in find_affected_nets_and_update_costs()
.
The code looks good Bill. I have no major concerns; some detailed comments (mostly about small bits of additional commenting). I only identified one small difference from the original CRITCALITY_PLACE algorithm and the new code, in commit_td_costs (ignored nets aren't ignored). You could try inserting the appropriate if to see if it speeds things up; seems unlikely, but easy to try. |
… description for three routines in place.cpp: find_affected_sink_pins(), commit_td_cost(), invalidate_affected_connection_delays().
Hi @vaughnbetz, I fixed all the issues raised by your comments in the most recent commit. Some comments are unresolved due to more explanation on my side. Also, I added function level header for Note: the comment I deleted was not essential. |
Hi @vaughnbetz , I finally found the culprit. The runtime increase is due to splitting out the routine |
Excellent, thanks Bill. The changes look good. There is one build failure in Travis CI: there are a few warnings in timing_place.cpp. Can you get rid of those? |
Hi @vaughnbetz, I can resolve the slow down in this PR. I think the runtime increase is due to the Currently trying out reserving/pre-allocating enough size for the vector and see if that alone makes the runtime increase go away. I don't really want to remove find_affected_sink_pins() yet, since it is useful for reverting timing analysis if we are using slack analysis in the placement inner loop. If the |
Great, thanks Bill! |
…rease. blocks_affected.affected_pins now only stores moved pins that have changed connection delays.
Thanks Bill. Can you launch a QoR run to make sure all is well, and the placer sped back up? |
Just checked the update_bb routines and they don't use .affected_pins so that is fine. |
Titan benchmark #run001 @commit 06c9cf6: https://drive.google.com/file/d/1kduvV2um0rjZRi7JVNZGHwCBIMwlATe5/view?usp=sharing There is some fluctuation in place time so I'm running a second round of Titan to confirm. But it's pretty obvious that the source of the slowdown is gone (and it's completely gone in my multiple test runs on experimental code changes). |
Titan benchmark #run002 @commit 06c9cf6: https://drive.google.com/file/d/1nTu_tpMZdKigYwwJxhS5Go0eUXeudbnx/view?usp=sharing There is still a tiny degradation. So I decided to look into the VPR output files for bitcoin_miner benchmark. Specifically, I'm looking for the inner placement loop log. Master: master.txt I see that there are multiple random spikes in placement iteration runtime stats in both files. Also, I (greedily) ran with the option To be more clear, before removing And even if the code is indeed a bit slower, here are my 4 arguments:
|
Thanks for tracking this down Bill. I agree this is close enough (1.5% and 2% placer runtime impact in the two runs, which is getting pretty close to noise). The code is more capable and cleaner, so good to have. |
(FYI, the one regtest failure is a known qor compare update issue; Sarah has updated it on the master already). |
Related Issue
simplified version of #1450 for easier code review and merging.
Copy of the note in the original PR
Hi @HackerFoo. Please checkout this draft PR to see if it's easier for you to review.
New features:
try_swap()
routine along withanalyze_setup_slack_cost()
routine.PlacerSetupSlacks
class intiming_place.h/cpp
update_timing_classes()
,update_timing_cost()
,perform_full_timing_update()
inplace.cpp
.--place_quench_metric {auto, timing_cost, setup_slack}
My steps for this PR:
PlacerSetupSlacks
is an imitation of the classPlacerCriticalities
, and is utilized to get setup slacks from Kevin's Tatum timing analyzer.I changed the original
recompute_criticalities()
toupdate_setup_slacks_and_criticalities()
so that both categories of values can be updated.One problem now occurs. Originally, each time the timing graph is updated by calling
timing_info->update()
, the criticalities are updated as well. These updates use incremental techniques so that they only checkout modified atom pins returned by the timing analyzer. However, now the timing graph might be updated only for the setup slacks (without having criticalities updated). On the next iteration of criticalities update, it cannot use the incremental technique anymore, since more pins will have been modified than the ones the timing analyzer currently returns. In this case, the criticalities have to be recomputed from scratch (going through each sink pin).To resolve this issue, I created the
t_timing_update_mode
class to keep track of the slack/criticality update status. No documentation on this structure at this commit.Now with the timing stuff down, I need to implement the setup slack analysis during the placement quench stage. So I made a vpr option
--place_quench_metric
to ask the user whether or not to turn on the new analysis technique in the quench.I modified the
try_swap()
routine so that, if using the slack analysis, it will do a timing_update to get slacks, analyze, and then decide whether or not to accept the move based on the slack improvement. If accepted, the timing update will be kept; if rejected, I revert the timing update so that the slacks in the timing graph returns to their original values.Note that these timing updates will not involve changing criticalities (hence I had the consideration for point ODIN II: dual_port_ram memory depth not bounded #3 and Area calculation error in routing_stats() #4), and I've made a lot of testings to make sure that the timing update reversion works.