Skip to content

VPR memory corruption while writing packing results (.net) file for large benchmark #71

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kmurray opened this issue Jun 26, 2015 · 12 comments
Assignees

Comments

@kmurray
Copy link
Contributor

kmurray commented Jun 26, 2015

Originally reported on Google Code with ID 78

What steps will reproduce the problem?
1. Run VPR packing on the linked blif file and architecture


The directrf benchmark seems to pack successfully, but encounters memory corruption
while writing out the packing results (.net) file.  

Looking at the .net file in the linked archive, you can see that it was only partially
written, while the log file shows glibc detecting memory corruption.

This may be difficult to debug, since the benchmark requires >64GB of RAM.

File archive is available from:
http://www.eecg.utoronto.ca/~kmurray/titan/directrf_vpr_mem_corruption.tar.gz




Reported by kevinemurray on 2014-03-19 15:57:37

@kmurray
Copy link
Contributor Author

kmurray commented Jun 26, 2015

Yes, this test case is too big for the machines that I have access to.  However, the
most fragile part of the packer code has been (finally) replaced, today, by more robust
code that is both faster and uses less memory.  Kevin, can you try this out and see
if the new code in the trunk takes care of your problem?

Reported by JasonKaiLuu on 2014-03-28 15:59:37

  • Status changed: Started

@kmurray kmurray self-assigned this Jun 26, 2015
@EliasVansteenkiste
Copy link

It seems that the problems are not resolved. VPR gives a bunch of errors.
In the first error message, VPR complains that a string is too long for the print_string method.
There are also 15 similar errors complaining about a signal that enters both clock ports and normal input ports.

I attached the list of errors.

errors_directrf.txt

@kmurray
Copy link
Contributor Author

kmurray commented Jan 14, 2016

Thanks for the update Elias.

I can't give a timeline for looking into this more deeply, but it is something we will hopefully address over the summer.

@EliasVansteenkiste
Copy link

LU_network_error.txt

Ok no problem.
VPR packs almost all Titan23 benchmark designs, except for directrf and LU_Network. According to the TRETS paper LU_Network VPR should be able to pack LU_Network.
I attached the error log for LU_Network. There are 75 fatal errors in the input netlist. All with the same error message. "You have a signal that enters both clock ports and normal input ports." I did not open a seperate issue, because this error message also occurs for directrf.
Should I open a new issue for this benchmark?

@kmurray
Copy link
Contributor Author

kmurray commented Jan 18, 2016

I think the clock-related issue is a separate one from the memory corruption problem. So opening a new issue is probably a good idea.

@EliasVansteenkiste
Copy link

Done.

@kmurray
Copy link
Contributor Author

kmurray commented Jan 21, 2016

I think I've managed to fix the problems related to this benchmark (c2a723e, e689204, and #113).

The benchmark should now go through packing and placement with out crashing or erroring out (although I haven't found a routeable channel width yet).

@kmurray kmurray closed this as completed Jan 21, 2016
@EliasVansteenkiste
Copy link

I checked out the latest version of VTR (Revision: d71ee16-dirty)
I am able to pack directrf, but for LU_Network the error messages persist.

I attached the log file for LU_Network.
vpr_stdout_LU_Network.txt

@kmurray
Copy link
Contributor Author

kmurray commented Jan 22, 2016

The LU problems appear to be the clock related issues in #111.

@qaarah
Copy link

qaarah commented Jul 31, 2018

Hi,
While working with an example, Reading the blif file generated by yosys using VPR, I get the memory corruption error "Net #1 (zro_zro) has no driver and will cause memory corruption." How to solve this problem?
Thanks

@kmurray
Copy link
Contributor Author

kmurray commented Jul 31, 2018

@qaarah Thanks for the report. This issue is already closed, so instead of commenting here, please open a new issue for the problem you have encountered.

@KKtiandao
Copy link

@qaarah I encountered the same problem; How do you resolve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants