Skip to content

ODIN crashes with stack overflow during statistics calculation #2098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aman26kbm opened this issue Jul 18, 2022 · 2 comments
Closed

ODIN crashes with stack overflow during statistics calculation #2098

aman26kbm opened this issue Jul 18, 2022 · 2 comments
Assignees

Comments

@aman26kbm
Copy link
Contributor

aman26kbm commented Jul 18, 2022

Expected Behaviour

There shouldn't be a crash.

Current Behaviour

With the new TPU 32x32 design for the Koios 2.0 benchmarks, ODIN crashes with a segmentation fault.

I created a debug build of VTR and ran the design again to see what was going on. I saw that there was a stack overflow when ODIN was generating statistics of the circuit. From the stack it looked like it was bouncing between lines 256 and 234 in netlist_statistic.cpp many times.

114 #87 0x5602b6bd1490 in get_upward_stat /export/aman/vtr_aman/vtr-verilog-to-routing/ODIN_II/SRC/netlist_statistic.cpp:256
115 #88 0x5602b6bd0ea5 in get_upward_stat /export/aman/vtr_aman/vtr-verilog-to-routing/ODIN_II/SRC/netlist_statistic.cpp:234

These two lines are from these functions:
static metric_t* get_upward_stat(nnet_t* net, netlist_t* netlist, uintptr_t traverse_mark_number)
static metric_t* get_upward_stat(nnode_t* node, netlist_t* netlist, uintptr_t traverse_mark_number)

So, I thought maybe something is causing an infinite loop that keeps these two functions being called back and forth endlessly.

I added some print statements to see if I could get an idea of the node/net in the netlist that is causing this. From the log that was generated after that, I don't see that behavior/pattern. That is, I don't see an infinite loop kinda thing. The tool seems to be progressing normally.

Then I thought maybe the machine just had low memory and so I ran this on a larger machine (125GB RAM). But I see the same behavior there as well.

A few other things to mention:

  1. I ran each individual module in the design and they ran without any error. So, something is wrong with the top level module in the design (where all the submodules are stitched together.

  2. A smaller version of this TPU design (TPU 16x16) passes through the whole flow without any error.

  3. This design is big, but not that big. There are much larger designs that go through the flow successfully.

Possible Solution

Steps to Reproduce

I've attached a tar ball that contains the design, the arch file, the config file, the ODIN file I updated (netlist_statistic.cpp) and the ODIN log generated with debug build and with print statements.

Context

Your Environment

  • VTR revision used: Master
  • Operating System and version: Ubuntu 20.04
  • Compiler version:

for_seyed_tpu_32x32.tar.gz

Tagging @VedantMohanty

@sdamghan
Copy link
Member

sdamghan commented Jul 24, 2022

@aman26kbm -is this issue similar to the recent one ( #2105 ) you have created about tau_32x32?

@aman26kbm
Copy link
Contributor Author

Oh sorry. I didn't realize I had already created an issue for this. Lemme close that one as duplicate.

sdamghan added a commit to sdamghan/vtr-verilog-to-routing that referenced this issue Jul 26, 2022
…cursive routine calls for big benchmarks

Related-issue: verilog-to-routing#2098

Signed-off-by: Seyed Alireza Damghani <[email protected]>
sdamghan added a commit to sdamghan/vtr-verilog-to-routing that referenced this issue Jul 26, 2022
…cursive routine calls for big benchmarks

Related-issue: verilog-to-routing#2098

Signed-off-by: Seyed Alireza Damghani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants