-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
FIX: 'parser_trim_buffers' properly initializes word pointers #13788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
can you add a test that reproduces the segfault (and this PR fixes) |
this is going to need a run of the asv suite for the csv benchmarks to see if anything changed. see here |
Current coverage is 85.23% (diff: 100%)@@ master #13788 diff @@
==========================================
Files 140 140
Lines 50415 50415
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
Hits 42971 42971
Misses 7444 7444
Partials 0 0
|
I ran the ASV benchmarks. It says
The full results are in the attached archive. |
I added a test that check for either segfault, or memory corruption during parsing. |
@ivannz You pulled in some changes from other PRs that have been merged recently. Can you rebase to fix this? Normally
should do the trick |
For the asv benchmarks, you will need to compare to current master to see if anything changed. In fact, we just did merge a pull request to clarify how to do this (#13794):
|
tests are just like any other parser test |
I had to update the stressfulness of the test, because sometimes safe_realloc just expands the parser->stream buffer, which does not corrupt the pointers in parser->words. |
import pandas as pd | ||
from pandas.compat import StringIO | ||
record_ = "9999-9,99:99,,,,ZZ,ZZ,,,ZZZ-ZZZZ,.Z-ZZZZ,-9.99,,,9.99,ZZZZZ,,-99,9,ZZZ-ZZZZ,ZZ-ZZZZ,,9.99,ZZZ-ZZZZZ,ZZZ-ZZZZZ,ZZZ-ZZZZ,ZZZ-ZZZZ,ZZZ-ZZZZ,ZZZ-ZZZZ,ZZZ-ZZZZ,ZZZ-ZZZZ,999,ZZZ-ZZZZ,,ZZ-ZZZZ,,,,,ZZZZ,ZZZ-ZZZZZ,ZZZ-ZZZZ,,,9,9,9,9,99,99,999,999,ZZZZZ,ZZZ-ZZZZZ,ZZZ-ZZZZ,9,ZZ-ZZZZ,9.99,ZZ-ZZZZ,ZZ-ZZZZ,,,,ZZZZ,,,ZZ,ZZ,,,,,,,,,,,,,9,,,999.99,999.99,,,ZZZZZ,,,Z9,,,,,,,ZZZ,ZZZ,,,,,,,,,,,ZZZZZ,ZZZZZ,ZZZ-ZZZZZZ,ZZZ-ZZZZZZ,ZZ-ZZZZ,ZZ-ZZZZ,ZZ-ZZZZ,ZZ-ZZZZ,,,999999,999999,ZZZ,ZZZ,,,ZZZ,ZZZ,999.99,999.99,,,,ZZZ-ZZZ,ZZZ-ZZZ,-9.99,-9.99,9,9,,99,,9.99,9.99,9,9,9.99,9.99,,,,9.99,9.99,,99,,99,9.99,9.99,,,ZZZ,ZZZ,,999.99,,999.99,ZZZ,ZZZ-ZZZZ,ZZZ-ZZZZ,,,ZZZZZ,ZZZZZ,ZZZ,ZZZ,9,9,,,,,,ZZZ-ZZZZ,ZZZ999Z,,,999.99,,999.99,ZZZ-ZZZZ,,,9.999,9.999,9.999,9.999,-9.999,-9.999,-9.999,-9.999,9.999,9.999,9.999,9.999,9.999,9.999,9.999,9.999,99999,ZZZ-ZZZZ,,9.99,ZZZ,,,,,,,,ZZZ,,,,,9,,,,9,,,,,,,,,,ZZZ-ZZZZ,ZZZ-ZZZZ,,ZZZZZ,ZZZZZ,ZZZZZ,ZZZZZ,,,9.99,,ZZ-ZZZZ,ZZ-ZZZZ,ZZ,999,,,,ZZ-ZZZZ,ZZZ,ZZZ,ZZZ-ZZZZ,ZZZ-ZZZZ,,,99.99,99.99,,,9.99,9.99,9.99,9.99,ZZZ-ZZZZ,,,ZZZ-ZZZZZ,,,,,-9.99,-9.99,-9.99,-9.99,,,,,,,,,ZZZ-ZZZZ,,9,9.99,9.99,99ZZ,,-9.99,-9.99,ZZZ-ZZZZ,,,,,,,ZZZ-ZZZZ,9.99,9.99,9999,,,,,,,,,,-9.9,Z/Z-ZZZZ,999.99,9.99,,999.99,ZZ-ZZZZ,ZZ-ZZZZ,9.99,9.99,9.99,9.99,9.99,9.99,,ZZZ-ZZZZZ,ZZZ-ZZZZZ,ZZZ-ZZZZZ,ZZZ-ZZZZZ,ZZZ-ZZZZZ,ZZZ,ZZZ,ZZZ,ZZZ,9.99,,,-9.99,ZZ-ZZZZ,-999.99,,-9999,,999.99,,,,999.99,99.99,,,ZZ-ZZZZZZZZ,ZZ-ZZZZ-ZZZZZZZ,,,,ZZ-ZZ-ZZZZZZZZ,ZZZZZZZZ,ZZZ-ZZZZ,9999,999.99,ZZZ-ZZZZ,-9.99,-9.99,ZZZ-ZZZZ,99:99:99,,99,99,,9.99,,-99.99,,,,,,9.99,ZZZ-ZZZZ,-9.99,-9.99,9.99,9.99,,ZZZ,,,,,,,ZZZ,ZZZ,,,,," | ||
csv_data = "\\n".join([record_]*173) + "\\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just right actual code. no need to shell out to do this.
further you need to compare the result with an expected frame.
Here are the results of asv continuous -E virtualenv upstream/master HEAD -b csv:
Complete results are in this archive: |
pls add a note in whatsnew / bug fix section |
@jreback , I rewrote the test as you suggested. It is a problem if |
@ivannz the whole point is for it NOT to recover. A segfault is as noticiable as any other error. If you have a test that segfaults, and it is fixed, it will pass. |
I added a note in whatsnew / bug fix section of Here are the latest results of asv continuous -E virtualenv upstream/master HEAD -b csv
|
@@ -5,6 +5,8 @@ | |||
import platform | |||
import codecs | |||
|
|||
import subprocess | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
@jorisvandenbossche , could you cancel Travis CI job #21105, please. |
except ValueError: | ||
# Ignore unsuported dtype=object by engine=python | ||
# in this case output_ list is empty | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead raise nose.SkipTest('....') here
thanks! |
I tried to compile master branch on Windows VS2015 and it failed with the errors:
So I guess |
can u make a new issue? and fix if u can |
OK. I'll later submit the fix. |
Summary
The pull request:
test.sh
except one (see the output);git diff upstream/master | flake8 --diff
;Details
Basically the function parse_trim_buffers did not properly move word pointers in parser->words and related fields. This pull request aims at fixing that.
Changes in parse_trim_buffers:
Output of
test.sh
:Apart from this, there were a couple of deprecation warnings in files other that tokenizer.c.