Skip to content

BUG: read_csv with bad file coreing #5156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Oct 8, 2013 · 6 comments · Fixed by #5268
Closed

BUG: read_csv with bad file coreing #5156

jreback opened this issue Oct 8, 2013 · 6 comments · Fixed by #5268
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Oct 8, 2013

from ML:
https://groups.google.com/forum/#!topic/pydata/KO-PmQdBUZI

In [1]: data = """c1\ntext11,text12\ntext21,text22"""

In [2]: read_csv(StringIO(data))
Out[2]: 
            c1
text11  text12
text21  text22

In [3]: read_csv(StringIO(data),header=0,names=list('abc'))
Bus error (core dumped)
@jreback
Copy link
Contributor Author

jreback commented Oct 8, 2013

cc @guyrt

@guyrt
Copy link
Contributor

guyrt commented Oct 16, 2013

Confirmed that including too many names can cause a segfault:

data = """\                                  
1,2,3
4,5,6"""

df = pd.read_csv(StringIO(data), header=0, names=['a', 'b', 'c', 'd'])
Segmentation fault

Looks like a problem with header. The fix to #4335 fixes this for python parser:

df = pd.read_csv(StringIO(data), header=0, names=['a', 'b', 'c', 'd'], engine='python')
ValueError: Number of passed names did not match number of header fields in the file

Looks like we need a similar fix for C engine.

@jreback
Copy link
Contributor Author

jreback commented Oct 16, 2013

gr8! i'll move this back to 0.13 then

@jreback
Copy link
Contributor Author

jreback commented Oct 17, 2013

@guyrt I think this would be gr8 to include in 0.13; going to cut the release candidate early next week.

@guyrt
Copy link
Contributor

guyrt commented Oct 17, 2013

I'll see what I can do. I've identified the problem: we overrun the parser buffer. Just working on a fix.

@jreback
Copy link
Contributor Author

jreback commented Oct 17, 2013

gr8 thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants