-
Notifications
You must be signed in to change notification settings - Fork 415
ODIN crashes with stack overflow during statistics calculation #2098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@aman26kbm -is this issue similar to the recent one ( #2105 ) you have created about tau_32x32? |
Oh sorry. I didn't realize I had already created an issue for this. Lemme close that one as duplicate. |
sdamghan
added a commit
to sdamghan/vtr-verilog-to-routing
that referenced
this issue
Jul 26, 2022
…cursive routine calls for big benchmarks Related-issue: verilog-to-routing#2098 Signed-off-by: Seyed Alireza Damghani <[email protected]>
sdamghan
added a commit
to sdamghan/vtr-verilog-to-routing
that referenced
this issue
Jul 26, 2022
…cursive routine calls for big benchmarks Related-issue: verilog-to-routing#2098 Signed-off-by: Seyed Alireza Damghani <[email protected]>
zhaisitong
added a commit
that referenced
this issue
Sep 12, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Expected Behaviour
There shouldn't be a crash.
Current Behaviour
With the new TPU 32x32 design for the Koios 2.0 benchmarks, ODIN crashes with a segmentation fault.
I created a debug build of VTR and ran the design again to see what was going on. I saw that there was a stack overflow when ODIN was generating statistics of the circuit. From the stack it looked like it was bouncing between lines 256 and 234 in netlist_statistic.cpp many times.
114 #87 0x5602b6bd1490 in get_upward_stat /export/aman/vtr_aman/vtr-verilog-to-routing/ODIN_II/SRC/netlist_statistic.cpp:256
115 #88 0x5602b6bd0ea5 in get_upward_stat /export/aman/vtr_aman/vtr-verilog-to-routing/ODIN_II/SRC/netlist_statistic.cpp:234
These two lines are from these functions:
static metric_t* get_upward_stat(nnet_t* net, netlist_t* netlist, uintptr_t traverse_mark_number)
static metric_t* get_upward_stat(nnode_t* node, netlist_t* netlist, uintptr_t traverse_mark_number)
So, I thought maybe something is causing an infinite loop that keeps these two functions being called back and forth endlessly.
I added some print statements to see if I could get an idea of the node/net in the netlist that is causing this. From the log that was generated after that, I don't see that behavior/pattern. That is, I don't see an infinite loop kinda thing. The tool seems to be progressing normally.
Then I thought maybe the machine just had low memory and so I ran this on a larger machine (125GB RAM). But I see the same behavior there as well.
A few other things to mention:
I ran each individual module in the design and they ran without any error. So, something is wrong with the top level module in the design (where all the submodules are stitched together.
A smaller version of this TPU design (TPU 16x16) passes through the whole flow without any error.
This design is big, but not that big. There are much larger designs that go through the flow successfully.
Possible Solution
Steps to Reproduce
I've attached a tar ball that contains the design, the arch file, the config file, the ODIN file I updated (netlist_statistic.cpp) and the ODIN log generated with debug build and with print statements.
Context
Your Environment
for_seyed_tpu_32x32.tar.gz
Tagging @VedantMohanty
The text was updated successfully, but these errors were encountered: