Skip to content

Very long runtime in parse_vtr_flow #1647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
litghost opened this issue Jan 28, 2021 · 19 comments · Fixed by #1658
Closed

Very long runtime in parse_vtr_flow #1647

litghost opened this issue Jan 28, 2021 · 19 comments · Fixed by #1658
Assignees

Comments

@litghost
Copy link
Collaborator

litghost commented Jan 28, 2021

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                                                                                                                                                             
 51.00%  58.00%    6.26s     7.17s   parse_vtr_flow (vtr/parse_vtr_flow.py:75)
 26.00%  26.00%    2.59s     2.59s   parse_vtr_flow (vtr/parse_vtr_flow.py:72)
 14.00%  14.00%    1.04s     1.04s   parse_vtr_flow (vtr/parse_vtr_flow.py:73)
  7.00%   7.00%   0.810s    0.810s   regex (vtr/inspect.py:39)
  2.00%   2.00%   0.150s    0.150s   parse_vtr_flow (vtr/parse_vtr_flow.py:76)
  0.00%   0.00%   0.100s    0.100s   regex (vtr/inspect.py:37)
  0.00%   0.00%   0.050s    0.050s   parse_vtr_flow (vtr/parse_vtr_flow.py:74)
  0.00% 100.00%   0.000s    11.00s   parse_task (vtr/parse_vtr_task.py:207)
  0.00% 100.00%   0.000s    11.00s   run_tasks (run_reg_test.py:315)
  0.00% 100.00%   0.000s    11.00s   <module> (run_reg_test.py:342)
  0.00% 100.00%   0.000s    11.00s   vtr_command_main (run_vtr_task.py:246)
  0.00% 100.00%   0.000s    11.00s   run_tasks (run_vtr_task.py:297)
  0.00% 100.00%   0.000s    11.00s   parse_tasks (vtr/parse_vtr_task.py:186)
  0.00% 100.00%   0.000s    11.00s   vtr_command_main (run_reg_test.py:183)

@vaughnbetz
Copy link
Contributor

May be due to huge number of deprecated warnings from rr-graph, based on what Xifan has said.

@litghost
Copy link
Collaborator Author

Long stack trace:

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                                                                                                                                                             
 54.00%  62.00%   31.22s    35.09s   parse_vtr_flow (vtr/parse_vtr_flow.py:75)
 25.00%  25.00%   13.63s    13.69s   parse_vtr_flow (vtr/parse_vtr_flow.py:72)
 10.00%  10.00%    4.89s     4.89s   parse_vtr_flow (vtr/parse_vtr_flow.py:73)
  8.00%   8.00%    3.68s     3.68s   regex (vtr/inspect.py:39)
  3.00%   3.00%    1.02s     1.02s   parse_vtr_flow (vtr/parse_vtr_flow.py:76)
  0.00%   0.00%   0.310s    0.310s   parse_vtr_flow (vtr/parse_vtr_flow.py:74)
  0.00%   0.00%   0.190s    0.190s   regex (vtr/inspect.py:37)
  0.00%   0.00%   0.040s    0.040s   decode (codecs.py:321)
  0.00%   0.00%   0.020s    0.020s   decode (codecs.py:323)
  0.00% 100.00%   0.000s    55.00s   <module> (run_reg_test.py:342)
  0.00% 100.00%   0.000s    55.00s   vtr_command_main (run_reg_test.py:183)
  0.00% 100.00%   0.000s    55.00s   vtr_command_main (run_vtr_task.py:246)
  0.00% 100.00%   0.000s    55.00s   run_tasks (run_reg_test.py:315)
  0.00% 100.00%   0.000s    55.00s   parse_task (vtr/parse_vtr_task.py:207)
  0.00% 100.00%   0.000s    55.00s   run_tasks (run_vtr_task.py:297)
  0.00% 100.00%   0.000s    55.00s   parse_tasks (vtr/parse_vtr_task.py:186)

@tangxifan
Copy link
Contributor

I have run vtr_strong regression test on my local machine (64-core 2.6GHz CPU), where I push test run on 32 cores.
The runtime results I get is

  • vpr runs finished in ~120s
  • result parser takes ~60s

I am trying to get a detail profiling report like the stack trace but have no clue how to run it. If you can share the command to run, I will try.

@litghost
Copy link
Collaborator Author

I am trying to get a detail profiling report like the stack trace but have no clue how to run it. If you can share the command to run, I will try.

I used https://github.com/benfred/py-spy

@litghost litghost changed the title Very long runtime in kokoro_log_reader Very long runtime in parse_vtr_flow Feb 4, 2021
@vaughnbetz
Copy link
Contributor

@jgoeders can you or somone on your team take a look? This is a long time for parsing with log files that aren't that big. We had an issue with lots of warnings that was exacerbating this, but even after fixing that we have slow parse times (parse time is 50% of the time of a total vpr run).
@litghost speculates that this may be a regular expression with a back-reference (can get slow). The stack track indicates the regex is the slow part, and that shouldn't be fundamental, but can easily happen with some forms of regex coding.

@tangxifan
Copy link
Contributor

Procedures to replicate my tests:

  • Checkout vtr github repo: git clone https://github.com/verilog-to-routing/vtr-verilog-to-routing.git
  • Compile VTR project make -j
  • Run strong regression tests time python3 run_reg_test.py vtr_reg_strong -j 32 (you may use another number if your local machine does not have 32 cores available).

@jgoeders
Copy link
Contributor

jgoeders commented Feb 8, 2021

I can't reproduce. I've added a PR (#1658) that prints the parse time, and the CI runs with 618s vpr, 14s parse. https://github.com/verilog-to-routing/vtr-verilog-to-routing/pull/1658/checks?check_run_id=1852010637

I see similar results on two different local machines, testing with -j8 and j20.

Can anyone else reproduce this locally?

@tangxifan
Copy link
Contributor

@jgoeders I managed to reproduce your results on my local machine. I may did something wrong last time. Let me investigate and go back to you later.

@tangxifan
Copy link
Contributor

I just did some tests to find out where the runtime difference coming from. Here is what I have found:

  • I have managed to reproduce both my previous results and Jeff's results.
  • It seems that the hard disks impact a lot in the parsing flow.

On my local machine, there are two types of hard disks:

  • A NFS mounted hard disk where data are transferred through network. This is where I am working on. The runtime results of running VTR strong regression tests (with -j 32) are as follows.
<VTR runs>
custom_grid:                           multiple_io_types.xml/raygentop.v/common                                                                                             		OK (took 119.45 seconds)
**Elapsed time: 128.03 seconds**

Parsing test results...
scripts/parse_vtr_task.py -l /research/ece/lnis/USERS/tang/tmp/vtr-verilog-to-routing/vtr_flow/tasks/regression_tests/vtr_reg_strong/task_list.txt
**Elapsed time: 36.08 seconds**
  • A local disk which is SSD. This is a scratch space. The runtime results of running VTR strong regression tests (with -j 32) are as follows.
<VTR runs>
custom_grid:                           multiple_io_types.xml/raygentop.v/common                                                                                             		OK (took 116.00 seconds)
**Elapsed time: 123.05 seconds**

Parsing test results...
scripts/parse_vtr_task.py -l /var/tmp/xtang/vtr-verilog-to-routing/vtr_flow/tasks/regression_tests/vtr_reg_strong/task_list.txt
**Elapsed time: 11.28 seconds**

This is what I have found so far. I am not sure about how the hard disk is organized on kokoro. VTR nightly tests have a long runtime. I can run the nightly test on my local machine (using SSD disk) and see it the vtr_parse_result is scalable.

@jgoeders
Copy link
Contributor

jgoeders commented Feb 8, 2021

Thanks for the additional info. Both of the local machines I tried are SSDs, so it sounds like this is bound by disk I/O. Not sure if there's much to be done to speed it up but I'll take a look.

@vaughnbetz
Copy link
Contributor

We might be able to put some files we parse on a diet. Sarah is simplifying and summarizing the clustering log file output. Are we parsing any big files other than the log file? Is some of the log file too big?

@litghost
Copy link
Collaborator Author

litghost commented Feb 9, 2021

I am not sure about how the hard disk is organized on kokoro

Kokoro disks should be pretty fast, but if we are disk limited and it is taking between 10 - 60 seconds on local machines, we probably need to put the logs on a diet (like Vaughn suggested).

@litghost
Copy link
Collaborator Author

litghost commented Feb 9, 2021

For additional context relevant to this issue, kokoro was spending more ~1.5 hrs parsing nightly logs in some instances.

@jgoeders
Copy link
Contributor

jgoeders commented Feb 9, 2021

My plan is to take a look at the code, and make sure we aren't reading the file in more than once, and that there aren't any unnecessarily complicated regexs. Aside from that, not sure there's much to do aside from slimming the logs.

@jgoeders
Copy link
Contributor

jgoeders commented Mar 6, 2021

Looks like we were reading each log file file multiple times: once for each things we were looking for. I'm fixing it in #1658 so that it groups parse items by filename and only reads each file once.

@tangxifan Can you see if this fix is faster for you?

@tangxifan
Copy link
Contributor

Hi @jgoeders,
Just run vtr_reg_strong tests on both a RAID disk and a SSD disk.
The runtime for parsing results has been reduced by 50%.

Results on a RAID disk (was 36.08 seconds)

<VTR runs>
**Elapsed time: 139.21 seconds**

Parsing test results...
scripts/parse_vtr_task.py -l /research/ece/lnis/USERS/tang/tmp/vtr-verilog-to-routing/vtr_flow/tasks/regression_tests/vtr_reg_strong/task_list.txt
**Elapsed time: 13.06 seconds**

Results on a SSD (was 11.28 seconds)

<VTR runs>
**Elapsed time: 128.81 seconds**

Parsing test results...
scripts/parse_vtr_task.py -l /var/tmp/xtang/vtr-verilog-to-routing/vtr_flow/tasks/regression_tests/vtr_reg_strong/task_list.txt
**Elapsed time: 6.33 seconds**

@jgoeders
Copy link
Contributor

jgoeders commented Mar 8, 2021

Great, thanks for running that! Looks like this is a solid improvement.

It is now reading the files one line at a time and matching against all patterns we are looking for. I tried reading the entire file at once, and doing a single regex match on the entire file for each pattern but that didn't seem to improve performance at all, so I'll leave it at this.

I'm hesitant to merge this until we fix the nightly so it passes (#1666) I don't want to push changes to the parser while we are still debugging QoR issues that use the parser.

@vaughnbetz
Copy link
Contributor

Thanks @jgoeders. Nightly should pass now; Aman had some stale QoR data in a new regtest he added and he has disabled that test until he has time to get the QoR data updated. So if you do a PR the nightly should pass and we can merge this

@jgoeders
Copy link
Contributor

@vaughnbetz Hmm, my PR #1658 is still failing the nightly. Not sure if this is from my changes or not, but it doesn't look like master is passing the nightly either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants