Collected parse_results*.txt as artifacts from CI run #1867

ganeshgore · 2021-10-07T20:30:29Z

Description

This update collects parse_results*.txt from CI regression run, which can be analyzed locally or used to replace existing golden results.

Related Issue

Addressing issue #1866

Motivation and Context

"It's common to have to update QoR files; we should streamline the process. This would also avoid issues where the CI machines are slower or faster or have libraries that take more memory etc. than someone's server, by generating the new results in CI itself." - from issue #1866

To update golden results, you might be able to do the following

Download *_golden_results artifacts and uncompress
*run rsync --recursive downloaded_artifact/ local_task_directory/
*run parse_vtr_task.py --create_golden
Create a new commit with updated golden results files
[*Remember to clean local run directory, parse_vtr_task collects results from the most recent run, and Step. 2 will update run001 by default]

How Has This Been Tested?

You can check the test CI run here
https://github.com/ganeshgore/vtr-verilog-to-routing/actions/runs/1317429846#artifacts

Types of changes

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation
I have updated the documentation accordingly
I have added tests to cover my changes
All new and existing tests passed

ganeshgore · 2021-10-13T17:52:46Z

This PR is also ready, let me know if anything else is expected.

tangxifan · 2021-10-18T17:37:18Z

@ganeshgore Compared to previous PR, what are the new packages here?

ganeshgore · 2021-10-18T17:39:14Z

everything named as *_golden_results

tangxifan · 2021-10-18T17:40:46Z

@vaughnbetz I did look into the changes and also download the artifacts. They do include all the parse_results.txt files. I attached the screenshot so that everyone can find the right place to download easily.

I am not the expert on the CI system. If you and @mithro knows anyone who is willing to review the codes here. It will help a lot.

mithro · 2021-10-18T17:43:12Z

@umarcor & @acomodi - PTAL?

umarcor · 2021-10-20T00:46:45Z

.github/workflows/test.yml

@@ -149,6 +149,13 @@ jobs:
          vtr_flow/**/*.net
          vtr_flow/**/*.r

+    - name: Upload golden results


I don't think it's worth creating an additional step. I would add the file to the list in the existing upload-artifact step.
When these artifacts are used for some specific purpose, then it'd be pertinent to evaluate whether to split it.

The *_regression_log artifact was getting bulky and the GitHub artifact download has low bandwidth.
That is why I created a separate artifact for golden results, maybe we can move more frequently needed files (like vtr_flow/**/*.log) also to this artifact. Just for quicker download and debug.
But if you think it is best to combine all in regression_log I will merge those two runs.

The point is that artifacts are cumbersome to deal with. The later the worse. Hence, if whatever needs to be done with this txt can be sorted out without uploading it as an artifact, that's desirable. In case uploading as an artifact is required, it'd be desirable to use it in a different step of the same workflow. Having it be an isolated artifact to be manually downloaded should be the worst case approach.

umarcor · 2021-10-20T00:50:13Z

To update golden results, you might be able to do the following

Download *_golden_results artifacts and uncompress

*run rsync --recursive downloaded_artifact/ local_task_directory/

*run parse_vtr_task.py --create_golden

Create a new commit with updated golden results files
[*Remember to clean local run directory, parse_vtr_task collects results from the most recent run, and Step. 2 will update run001 by default]

I think that this procedure should be part of this PR. So, instead of uploading the .txt as an artifact, do create a branch/PR when the golden results need to be updated. That is doable in CI with the credentials copied from the checkout action. See, for instance: https://github.com/hdl/containers/blob/main/.github/workflows/doc.yml#L61.

ganeshgore · 2021-10-20T01:06:54Z

I am not sure how to know when

... the golden results need to be updated ...

My understanding was it depends on the user to update or not.

umarcor · 2021-10-20T01:09:14Z

I am not sure how to know when

... the golden results need to be updated ...

My understanding was it depends on the user to update or not.

@ganeshgore, see #1866:

Right now when we fail a QoR check in CI due to an expected QoR change, the developer needs to rerun the test on his/her machine and run parse_vtr_task.py --create_golden to make a new expected QoR file (golden_results.txt).

We can simplify this by bringing back the parse_results*.txt files (currently parse_results.txt and parse_results2.txt is also used in some tests), and then the developer could just run parse_vtr_task.py --create_golden using those files.

vaughnbetz · 2021-10-20T13:50:24Z

If the CI artifacts are slow when they get big, then we can just upload the short output files (don't do .blif, .net, .p or .r files as they are the big ones). Even better if you can split into a separate artifact
I'd like updating golden results to be fast after a failure, and still allow a person to look at the golden results he/she is about to put back and edit them if necessary. Downloading the golden_results won't do that -- those are the old results. We need to download the new parsed results and use them to create a new golden result (or have a script or command that creates the new golden results automatically from the CI artifacts).

umarcor · 2021-10-20T14:00:53Z

I'd like updating golden results to be fast after a failure,

@vaughnbetz can you please be more explicit about this? Assume that the reader is not aware of the internals of VTR. What does it mean "after a failure"? Which tests/jobs do we need to monitor in order to decide that "this is a failure that needs to trigger a golden results update" vs "this is a legit failure of the job/workflow which does not require further action in regard to golden results".

vaughnbetz · 2021-10-20T14:09:18Z

I don't think we can make this decision automatically. A QoR update happens when:

Someone looks at specific failures and concludes the QoR bounds were too tight or lucky (mostly on small circuits) and hence updating them is OK.
Someone has changed an algorithm, and expects better QoR or a new trade-off (maybe a new feature for moderately higher CPU, or maybe a CPU reduction that takes us outside the current QoR bounds, ...). After collecting results, discussing with others, etc. the trade-off is ruled a good one and the QoR data is updated as part of the PR that puts the new code in. Easiest to update this by simply making a PR, seeing what the QoR failures are (if any) and getting the CI artifact data to make the new golden results.

umarcor · 2021-10-20T14:19:53Z

@vaughnbetz, thanks!

@ganeshgore, according to the last explanations, I take back my previous suggestion. It is ok to have this file as a separated artifact.

ganeshgore · 2021-10-21T15:23:12Z

Thank you @vaughnbetz and @umarcor for your comments.
I was planning to make the following 2 artifacts for each regression test, *_results will be fairly small for quick evaluation (faster download), users can download *_run_files for more elaborate debugging. Does that sound good to you all?
The above procedure to update golden results after downloading the artifacts (*_results) will still work fine.

*_run_files artifacts

vtr_flow/**/*.out
vtr_flow/**/*.blif
vtr_flow/**/*.p
vtr_flow/**/*.net
vtr_flow/**/*.r

*_results artifacts

vtr_flow/**/*.log
vtr_flow/**/parse_results*.txt

vaughnbetz · 2021-10-21T15:48:06Z

Makes sense to me; thanks.

ganeshgore added 2 commits October 7, 2021 12:20

[Cleanup] Removed failed keyword from artifact lable

6d235b5

Added golden results as CI artifacts

616aee3

github-actions bot added the infra Project Infrastructure label Oct 7, 2021

tangxifan added the kokoro:force-run label Oct 18, 2021

symbiflow-robot removed the kokoro:force-run label Oct 18, 2021

mithro requested a review from acomodi October 18, 2021 17:43

umarcor reviewed Oct 20, 2021

View reviewed changes

umarcor approved these changes Oct 20, 2021

View reviewed changes

Updated CI setup

576cef1

ganeshgore merged commit a6e200e into verilog-to-routing:master Oct 28, 2021

Collected parse_results*.txt as artifacts from CI run #1867

Collected parse_results*.txt as artifacts from CI run #1867

Uh oh!

Conversation

ganeshgore commented Oct 7, 2021

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

ganeshgore commented Oct 13, 2021

Uh oh!

tangxifan commented Oct 18, 2021

Uh oh!

ganeshgore commented Oct 18, 2021

Uh oh!

tangxifan commented Oct 18, 2021

Uh oh!

mithro commented Oct 18, 2021

Uh oh!

umarcor Oct 20, 2021

Choose a reason for hiding this comment

Uh oh!

ganeshgore Oct 20, 2021

Choose a reason for hiding this comment

Uh oh!

umarcor Oct 20, 2021

Choose a reason for hiding this comment

Uh oh!

umarcor commented Oct 20, 2021

Uh oh!

ganeshgore commented Oct 20, 2021

Uh oh!

umarcor commented Oct 20, 2021

Uh oh!

vaughnbetz commented Oct 20, 2021

Uh oh!

umarcor commented Oct 20, 2021

Uh oh!

vaughnbetz commented Oct 20, 2021

Uh oh!

umarcor commented Oct 20, 2021

Uh oh!

ganeshgore commented Oct 21, 2021

Uh oh!

vaughnbetz commented Oct 21, 2021

Uh oh!

Uh oh!