Skip to content

BUG: Fix buffer overflows in tokenizer.c that caused python to segfault with certain #9360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

selasley
Copy link
Contributor

closes #9205

@jreback jreback added Bug IO CSV read_csv, to_csv labels Jan 27, 2015
@jreback jreback added this to the 0.16.0 milestone Jan 27, 2015
@jreback
Copy link
Contributor

jreback commented Jan 27, 2015

whoosh, quite a number of changes for a 'simple' bug.....

@selasley
Copy link
Contributor Author

there was some pruning of routines that didn't seem to be used anymore and some changes to the way pointers were reallocated and freed while investigating the problem. the heart of the fix is checking lengths against the buffer caps as items are added to the buffers. i thought the extra checking might slow things down slightly, but vbench claims the tokenizer branch is as fast as upstream/master. i'll take a look at the other csv issues you pointed me to earlier

@jreback
Copy link
Contributor

jreback commented Jan 28, 2015

@selasley hahh.. wasn't being critical. I am very glad you refactored. Sometimes just needs to be done! thanks.

@@ -210,3 +208,6 @@ Bug Fixes
- Fixes issue with ``index_col=False`` when ``usecols`` is also specified in ``read_csv``. (:issue:`9082`)
- Bug where ``wide_to_long`` would modify the input stubnames list (:issue:`9204`)
- Bug in to_sql not storing float64 values using double precision. (:issue:`9009`)


- Bug in csv. Buffer overflows with certain malformed input files (:issue:`9205`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug in read_csv.

@jreback
Copy link
Contributor

jreback commented Jan 28, 2015

will have to test this on windows.

@selasley
Copy link
Contributor Author

:) I was a bit nervous about changing so many lines of other people's code. Thanks for checking on windows. I don't have access to a windows machine at home or work to do any testing.
I fixed the typo in the whatsnew document and pushed the changes

@jreback
Copy link
Contributor

jreback commented Jan 30, 2015

ok, this looks good on windows.

can you do a vbench run on this? just to confirm nothing has changed?

ping when done and we'll get this in.

@jreback
Copy link
Contributor

jreback commented Feb 5, 2015

merged via c6c9c0b

@jreback jreback closed this Feb 5, 2015
@jreback
Copy link
Contributor

jreback commented Feb 5, 2015

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pandas error kills IPython kernel
2 participants