FIX: 'parser_trim_buffers' properly initializes word pointers #13788

ivannz · 2016-07-25T17:34:16Z

Summary

The pull request:

closes Unexpected segmentation fault in pd.read_csv C-engine #13703;
passes all tests in test.sh except one (see the output);
passes the code style test git diff upstream/master | flake8 --diff;

Details

Basically the function parse_trim_buffers did not properly move word pointers in parser->words and related fields. This pull request aims at fixing that.

Changes in parse_trim_buffers:

block L1239 -- L1256 (/* trim stream */) was augmented with parser->words update loop;
blocks L1224 -- L1237 (/* trim words, word_starts */) and L1239 -- L1256(/* trim stream */) were swapped to preserve pointer consistency;

Output of `test.sh`:

======================================================================
FAIL: test_round_trip_frame_sep (pandas.io.tests.test_clipboard.TestClipboard)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ivannz/Github/pd_fork/pandas/pandas/io/tests/test_clipboard.py", line 73, in test_round_trip_frame_sep
    self.check_round_trip_frame(dt, sep=',')
  File "/Users/ivannz/Github/pd_fork/pandas/pandas/io/tests/test_clipboard.py", line 69, in check_round_trip_frame
    tm.assert_frame_equal(data, result, check_dtype=False)
  File "/Users/ivannz/Github/pd_fork/pandas/pandas/util/testing.py", line 1276, in assert_frame_equal
    right.columns))
  File "/Users/ivannz/Github/pd_fork/pandas/pandas/util/testing.py", line 1022, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame are different

DataFrame shape (number of columns) are different
[left]:  2, Index([u'en', u'es'], dtype='object')
[right]: 0, Index([], dtype='object')

----------------------------------------------------------------------
Ran 10377 tests in 1197.728s

FAILED (SKIP=363, failures=1)

Apart from this, there were a couple of deprecation warnings in files other that tokenizer.c.

jreback · 2016-07-25T21:40:58Z

can you add a test that reproduces the segfault (and this PR fixes)

jreback · 2016-07-25T21:42:51Z

this is going to need a run of the asv suite for the csv benchmarks to see if anything changed. see here

codecov-io · 2016-07-25T23:40:53Z

Current coverage is 85.23% (diff: 100%)

Merging #13788 into master will not change coverage

@@             master     #13788   diff @@
==========================================
  Files           140        140          
  Lines         50415      50415          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          42971      42971          
  Misses         7444       7444          
  Partials          0          0

Powered by Codecov. Last update a3cddfa...d59624e

ivannz · 2016-07-26T05:26:12Z

I ran the ASV benchmarks. It says

[ 12.88%] ··· Running gil.nogil_read_csv.time_nogil_read_csv                                                                                          327.76ms
[ 12.88%] ··· Running gil.nogil_read_csv.time_nogil_read_csv                                                                                          327.76ms
[ 12.94%] ··· Running gil.nogil_read_csv.time_nogil_read_csv_datetime                                                                                 326.62ms
[ 13.00%] ··· Running gil.nogil_read_csv.time_nogil_read_csv_object                                                                                    18.14ms
[ 30.96%] ··· Running io_bench.read_csv_from_s3.time_read_nrows                                                                                        n/a;...
[ 31.02%] ··· Running io_bench.read_csv_infer_datetime_format_custom.time_read_csv_infer_datetime_format_custom                                        11.53ms
[ 31.08%] ··· Running io_bench.read_csv_infer_datetime_format_iso8601.time_read_csv_infer_datetime_format_iso8601                                       2.07ms
[ 31.14%] ··· Running io_bench.read_csv_infer_datetime_format_ymd.time_read_csv_infer_datetime_format_ymd                                               2.13ms
[ 31.20%] ··· Running io_bench.read_csv_skiprows.time_read_csv_skiprows                                                                                12.84ms
[ 31.26%] ··· Running io_bench.read_csv_standard.time_read_csv_standard                                                                                11.19ms
[ 34.46%] ··· Running packers.packers_read_csv.time_packers_read_csv                                                                                  180.63ms
[ 36.88%] ··· Running parser_vb.read_csv_comment2.time_read_csv_comment2                                                                               25.18ms
[ 36.94%] ··· Running parser_vb.read_csv_default_converter.time_read_csv_default_converter                                                              1.94ms
[ 37.00%] ··· Running parser_vb.read_csv_default_converter_python_engine.time_read_csv_default_converter                                                3.26ms
[ 37.06%] ··· Running parser_vb.read_csv_default_converter_with_decimal.time_read_csv_default_converter_with_decimal                                    1.98ms
[ 37.12%] ··· Running parser_vb.read_csv_default_converter_with_decimal_python_engine.time_read_csv_default_converter_with_decimal                      9.93ms
[ 37.18%] ··· Running parser_vb.read_csv_precise_converter.time_read_csv_precise_converter                                                              1.91ms
[ 37.24%] ··· Running parser_vb.read_csv_roundtrip_converter.time_read_csv_roundtrip_converter                                                          2.61ms
[ 37.30%] ··· Running parser_vb.read_csv_thou_vb.time_read_csv_thou_vb                                                                                 25.74ms
[ 37.36%] ··· Running parser_vb.read_csv_vb.time_read_csv_vb                                                                                           21.59ms
[ 62.88%] ··· Running gil.nogil_read_csv.time_nogil_read_csv                                                                                          321.78ms
[ 62.94%] ··· Running gil.nogil_read_csv.time_nogil_read_csv_datetime                                                                                 322.15ms
[ 63.00%] ··· Running gil.nogil_read_csv.time_nogil_read_csv_object                                                                                    16.19ms
[ 80.96%] ··· Running io_bench.read_csv_from_s3.time_read_nrows                                                                                        n/a;...
[ 81.02%] ··· Running io_bench.read_csv_infer_datetime_format_custom.time_read_csv_infer_datetime_format_custom                                        11.52ms
[ 81.08%] ··· Running io_bench.read_csv_infer_datetime_format_iso8601.time_read_csv_infer_datetime_format_iso8601                                       2.03ms
[ 81.14%] ··· Running io_bench.read_csv_infer_datetime_format_ymd.time_read_csv_infer_datetime_format_ymd                                               2.10ms
[ 81.20%] ··· Running io_bench.read_csv_skiprows.time_read_csv_skiprows                                                                                12.41ms
[ 81.26%] ··· Running io_bench.read_csv_standard.time_read_csv_standard                                                                                10.81ms
[ 84.46%] ··· Running packers.packers_read_csv.time_packers_read_csv                                                                                  171.84ms
[ 86.88%] ··· Running parser_vb.read_csv_comment2.time_read_csv_comment2                                                                               25.49ms
[ 86.94%] ··· Running parser_vb.read_csv_default_converter.time_read_csv_default_converter                                                              1.87ms
[ 87.00%] ··· Running parser_vb.read_csv_default_converter_python_engine.time_read_csv_default_converter                                                3.26ms
[ 87.06%] ··· Running parser_vb.read_csv_default_converter_with_decimal.time_read_csv_default_converter_with_decimal                                    1.80ms
[ 87.12%] ··· Running parser_vb.read_csv_default_converter_with_decimal_python_engine.time_read_csv_default_converter_with_decimal                      9.94ms
[ 87.18%] ··· Running parser_vb.read_csv_precise_converter.time_read_csv_precise_converter                                                              1.74ms
[ 87.24%] ··· Running parser_vb.read_csv_roundtrip_converter.time_read_csv_roundtrip_converter                                                          2.42ms
[ 87.30%] ··· Running parser_vb.read_csv_thou_vb.time_read_csv_thou_vb                                                                                 25.13ms
[ 87.36%] ··· Running parser_vb.read_csv_vb.time_read_csv_vb                                                                                           20.23ms

...

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

The full results are in the attached archive.
results.tar.gz

ivannz · 2016-07-26T09:06:26Z

I added a test that check for either segfault, or memory corruption during parsing.

jorisvandenbossche · 2016-07-26T09:24:47Z

@ivannz You pulled in some changes from other PRs that have been merged recently. Can you rebase to fix this? Normally

git fetch upstream
git rebase upstream/master
git push -f origin parser_trim_fix

should do the trick

jorisvandenbossche · 2016-07-26T09:26:46Z

For the asv benchmarks, you will need to compare to current master to see if anything changed. In fact, we just did merge a pull request to clarify how to do this (#13794):

asv continuous upstream/master HEAD -b csv

jreback · 2016-07-26T09:50:03Z

tests are just like any other parser test
put in io/tests/parser/common.py

ivannz · 2016-07-26T10:21:27Z

I had to update the stressfulness of the test, because sometimes safe_realloc just expands the parser->stream buffer, which does not corrupt the pointers in parser->words.

jreback · 2016-07-26T10:23:38Z

pandas/io/tests/parser/common.py

+import pandas as pd
+from pandas.compat import StringIO
+record_ = "9999-9,99:99,,,,ZZ,ZZ,,,ZZZ-ZZZZ,.Z-ZZZZ,-9.99,,,9.99,ZZZZZ,,-99,9,ZZZ-ZZZZ,ZZ-ZZZZ,,9.99,ZZZ-ZZZZZ,ZZZ-ZZZZZ,ZZZ-ZZZZ,ZZZ-ZZZZ,ZZZ-ZZZZ,ZZZ-ZZZZ,ZZZ-ZZZZ,ZZZ-ZZZZ,999,ZZZ-ZZZZ,,ZZ-ZZZZ,,,,,ZZZZ,ZZZ-ZZZZZ,ZZZ-ZZZZ,,,9,9,9,9,99,99,999,999,ZZZZZ,ZZZ-ZZZZZ,ZZZ-ZZZZ,9,ZZ-ZZZZ,9.99,ZZ-ZZZZ,ZZ-ZZZZ,,,,ZZZZ,,,ZZ,ZZ,,,,,,,,,,,,,9,,,999.99,999.99,,,ZZZZZ,,,Z9,,,,,,,ZZZ,ZZZ,,,,,,,,,,,ZZZZZ,ZZZZZ,ZZZ-ZZZZZZ,ZZZ-ZZZZZZ,ZZ-ZZZZ,ZZ-ZZZZ,ZZ-ZZZZ,ZZ-ZZZZ,,,999999,999999,ZZZ,ZZZ,,,ZZZ,ZZZ,999.99,999.99,,,,ZZZ-ZZZ,ZZZ-ZZZ,-9.99,-9.99,9,9,,99,,9.99,9.99,9,9,9.99,9.99,,,,9.99,9.99,,99,,99,9.99,9.99,,,ZZZ,ZZZ,,999.99,,999.99,ZZZ,ZZZ-ZZZZ,ZZZ-ZZZZ,,,ZZZZZ,ZZZZZ,ZZZ,ZZZ,9,9,,,,,,ZZZ-ZZZZ,ZZZ999Z,,,999.99,,999.99,ZZZ-ZZZZ,,,9.999,9.999,9.999,9.999,-9.999,-9.999,-9.999,-9.999,9.999,9.999,9.999,9.999,9.999,9.999,9.999,9.999,99999,ZZZ-ZZZZ,,9.99,ZZZ,,,,,,,,ZZZ,,,,,9,,,,9,,,,,,,,,,ZZZ-ZZZZ,ZZZ-ZZZZ,,ZZZZZ,ZZZZZ,ZZZZZ,ZZZZZ,,,9.99,,ZZ-ZZZZ,ZZ-ZZZZ,ZZ,999,,,,ZZ-ZZZZ,ZZZ,ZZZ,ZZZ-ZZZZ,ZZZ-ZZZZ,,,99.99,99.99,,,9.99,9.99,9.99,9.99,ZZZ-ZZZZ,,,ZZZ-ZZZZZ,,,,,-9.99,-9.99,-9.99,-9.99,,,,,,,,,ZZZ-ZZZZ,,9,9.99,9.99,99ZZ,,-9.99,-9.99,ZZZ-ZZZZ,,,,,,,ZZZ-ZZZZ,9.99,9.99,9999,,,,,,,,,,-9.9,Z/Z-ZZZZ,999.99,9.99,,999.99,ZZ-ZZZZ,ZZ-ZZZZ,9.99,9.99,9.99,9.99,9.99,9.99,,ZZZ-ZZZZZ,ZZZ-ZZZZZ,ZZZ-ZZZZZ,ZZZ-ZZZZZ,ZZZ-ZZZZZ,ZZZ,ZZZ,ZZZ,ZZZ,9.99,,,-9.99,ZZ-ZZZZ,-999.99,,-9999,,999.99,,,,999.99,99.99,,,ZZ-ZZZZZZZZ,ZZ-ZZZZ-ZZZZZZZ,,,,ZZ-ZZ-ZZZZZZZZ,ZZZZZZZZ,ZZZ-ZZZZ,9999,999.99,ZZZ-ZZZZ,-9.99,-9.99,ZZZ-ZZZZ,99:99:99,,99,99,,9.99,,-99.99,,,,,,9.99,ZZZ-ZZZZ,-9.99,-9.99,9.99,9.99,,ZZZ,,,,,,,ZZZ,ZZZ,,,,,"
+csv_data = "\\n".join([record_]*173) + "\\n"


just right actual code. no need to shell out to do this.

further you need to compare the result with an expected frame.

ivannz · 2016-07-26T10:25:14Z

Here are the results of asv continuous -E virtualenv upstream/master HEAD -b csv:

Running 50 total benchmarks (2 commits * 1 environments * 25 benchmarks)
[  0.00%] · For pandas commit hash c092c2b0:
[  2.00%] ··· Running gil.nogil_read_csv.time_nogil_read_csv                                                                                                                                330.53ms
[  4.00%] ··· Running gil.nogil_read_csv.time_nogil_read_csv_datetime                                                                                                                       331.21ms
[  6.00%] ··· Running gil.nogil_read_csv.time_nogil_read_csv_object                                                                                                                          16.88ms
[  8.00%] ··· Running io_bench.frame_to_csv.time_frame_to_csv                                                                                                                                54.20ms
[ 10.00%] ··· Running io_bench.frame_to_csv2.time_frame_to_csv2                                                                                                                             107.84ms
[ 12.00%] ··· Running io_bench.frame_to_csv_date_formatting.time_frame_to_csv_date_formatting                                                                                                13.89ms
[ 14.00%] ··· Running io_bench.frame_to_csv_mixed.time_frame_to_csv_mixed                                                                                                                   180.47ms
[ 16.00%] ··· Running io_bench.read_csv_from_s3.time_read_nrows                                                                                                                              n/a;...
[ 18.00%] ··· Running io_bench.read_csv_infer_datetime_format_custom.time_read_csv_infer_datetime_format_custom                                                                              11.98ms
[ 20.00%] ··· Running io_bench.read_csv_infer_datetime_format_iso8601.time_read_csv_infer_datetime_format_iso8601                                                                             2.15ms
[ 22.00%] ··· Running io_bench.read_csv_infer_datetime_format_ymd.time_read_csv_infer_datetime_format_ymd                                                                                     2.15ms
[ 24.00%] ··· Running io_bench.read_csv_skiprows.time_read_csv_skiprows                                                                                                                      13.10ms
[ 26.00%] ··· Running io_bench.read_csv_standard.time_read_csv_standard                                                                                                                      11.36ms
[ 28.00%] ··· Running io_bench.write_csv_standard.time_write_csv_standard                                                                                                                    22.73ms
[ 30.00%] ··· Running packers.packers_read_csv.time_packers_read_csv                                                                                                                        187.62ms
[ 32.00%] ··· Running packers.packers_write_csv.time_packers_write_csv                                                                                                                      625.05ms
[ 34.00%] ··· Running parser_vb.read_csv_comment2.time_read_csv_comment2                                                                                                                     26.11ms
[ 36.00%] ··· Running parser_vb.read_csv_default_converter.time_read_csv_default_converter                                                                                                    1.97ms
[ 38.00%] ··· Running parser_vb.read_csv_default_converter_python_engine.time_read_csv_default_converter                                                                                      3.33ms
[ 40.00%] ··· Running parser_vb.read_csv_default_converter_with_decimal.time_read_csv_default_converter_with_decimal                                                                          1.98ms
[ 42.00%] ··· Running parser_vb.read_csv_default_converter_with_decimal_python_engine.time_read_csv_default_converter_with_decimal                                                           10.16ms
[ 44.00%] ··· Running parser_vb.read_csv_precise_converter.time_read_csv_precise_converter                                                                                                    1.91ms
[ 46.00%] ··· Running parser_vb.read_csv_roundtrip_converter.time_read_csv_roundtrip_converter                                                                                                2.67ms
[ 48.00%] ··· Running parser_vb.read_csv_thou_vb.time_read_csv_thou_vb                                                                                                                       23.23ms
[ 50.00%] ··· Running parser_vb.read_csv_vb.time_read_csv_vb                                                                                                                                 22.11ms

[ 50.00%] · For pandas commit hash 690d52cf:
[ 52.00%] ··· Running gil.nogil_read_csv.time_nogil_read_csv                                                                                                                                321.53ms
[ 54.00%] ··· Running gil.nogil_read_csv.time_nogil_read_csv_datetime                                                                                                                       323.61ms
[ 56.00%] ··· Running gil.nogil_read_csv.time_nogil_read_csv_object                                                                                                                          20.18ms
[ 58.00%] ··· Running io_bench.frame_to_csv.time_frame_to_csv                                                                                                                                53.05ms
[ 60.00%] ··· Running io_bench.frame_to_csv2.time_frame_to_csv2                                                                                                                             110.65ms
[ 62.00%] ··· Running io_bench.frame_to_csv_date_formatting.time_frame_to_csv_date_formatting                                                                                                13.39ms
[ 64.00%] ··· Running io_bench.frame_to_csv_mixed.time_frame_to_csv_mixed                                                                                                                   182.09ms
[ 66.00%] ··· Running io_bench.read_csv_from_s3.time_read_nrows                                                                                                                              n/a;...
[ 68.00%] ··· Running io_bench.read_csv_infer_datetime_format_custom.time_read_csv_infer_datetime_format_custom                                                                              12.03ms
[ 70.00%] ··· Running io_bench.read_csv_infer_datetime_format_iso8601.time_read_csv_infer_datetime_format_iso8601                                                                             2.11ms
[ 72.00%] ··· Running io_bench.read_csv_infer_datetime_format_ymd.time_read_csv_infer_datetime_format_ymd                                                                                     2.19ms
[ 74.00%] ··· Running io_bench.read_csv_skiprows.time_read_csv_skiprows                                                                                                                      13.24ms
[ 76.00%] ··· Running io_bench.read_csv_standard.time_read_csv_standard                                                                                                                      11.19ms
[ 78.00%] ··· Running io_bench.write_csv_standard.time_write_csv_standard                                                                                                                    22.07ms
[ 80.00%] ··· Running packers.packers_read_csv.time_packers_read_csv                                                                                                                        175.88ms
[ 82.00%] ··· Running packers.packers_write_csv.time_packers_write_csv                                                                                                                      638.85ms
[ 84.00%] ··· Running parser_vb.read_csv_comment2.time_read_csv_comment2                                                                                                                     27.93ms
[ 86.00%] ··· Running parser_vb.read_csv_default_converter.time_read_csv_default_converter                                                                                                    1.85ms
[ 88.00%] ··· Running parser_vb.read_csv_default_converter_python_engine.time_read_csv_default_converter                                                                                      3.38ms
[ 90.00%] ··· Running parser_vb.read_csv_default_converter_with_decimal.time_read_csv_default_converter_with_decimal                                                                          1.91ms
[ 92.00%] ··· Running parser_vb.read_csv_default_converter_with_decimal_python_engine.time_read_csv_default_converter_with_decimal                                                           10.13ms
[ 94.00%] ··· Running parser_vb.read_csv_precise_converter.time_read_csv_precise_converter                                                                                                    1.77ms
[ 96.00%] ··· Running parser_vb.read_csv_roundtrip_converter.time_read_csv_roundtrip_converter                                                                                                2.52ms
[ 98.00%] ··· Running parser_vb.read_csv_thou_vb.time_read_csv_thou_vb                                                                                                                       24.98ms
[100.00%] ··· Running parser_vb.read_csv_vb.time_read_csv_vb                                                                                                                                 24.66ms
BENCHMARKS NOT SIGNIFICANTLY CHANGED.

Complete results are in this archive:
results.tar.gz

jreback · 2016-07-26T10:25:38Z

pls add a note in whatsnew / bug fix section

ivannz · 2016-07-26T11:17:08Z

@jreback , I rewrote the test as you suggested. It is a problem if nosetests does not recover from segfault when run on it?

jreback · 2016-07-26T11:27:19Z

@ivannz the whole point is for it NOT to recover. A segfault is as noticiable as any other error. If you have a test that segfaults, and it is fixed, it will pass.

ivannz · 2016-07-26T12:23:26Z

I added a note in whatsnew / bug fix section of v0.19.0.txt.

Here are the latest results of asv continuous -E virtualenv upstream/master HEAD -b csv

· Running 50 total benchmarks (2 commits * 1 environments * 25 benchmarks)
[  0.00%] · For pandas commit hash c715ec76:
[  2.00%] ··· Running gil.nogil_read_csv.time_nogil_read_csv                                                                                                                                332.52ms
[  4.00%] ··· Running gil.nogil_read_csv.time_nogil_read_csv_datetime                                                                                                                       336.29ms
[  6.00%] ··· Running gil.nogil_read_csv.time_nogil_read_csv_object                                                                                                                          17.21ms
[  8.00%] ··· Running io_bench.frame_to_csv.time_frame_to_csv                                                                                                                                53.66ms
[ 10.00%] ··· Running io_bench.frame_to_csv2.time_frame_to_csv2                                                                                                                             107.95ms
[ 12.00%] ··· Running io_bench.frame_to_csv_date_formatting.time_frame_to_csv_date_formatting                                                                                                14.02ms
[ 14.00%] ··· Running io_bench.frame_to_csv_mixed.time_frame_to_csv_mixed                                                                                                                   185.26ms
[ 16.00%] ··· Running io_bench.read_csv_from_s3.time_read_nrows                                                                                                                              n/a;...
[ 18.00%] ··· Running io_bench.read_csv_infer_datetime_format_custom.time_read_csv_infer_datetime_format_custom                                                                              11.95ms
[ 20.00%] ··· Running io_bench.read_csv_infer_datetime_format_iso8601.time_read_csv_infer_datetime_format_iso8601                                                                             2.14ms
[ 22.00%] ··· Running io_bench.read_csv_infer_datetime_format_ymd.time_read_csv_infer_datetime_format_ymd                                                                                     2.17ms
[ 24.00%] ··· Running io_bench.read_csv_skiprows.time_read_csv_skiprows                                                                                                                      13.40ms
[ 26.00%] ··· Running io_bench.read_csv_standard.time_read_csv_standard                                                                                                                      11.43ms
[ 28.00%] ··· Running io_bench.write_csv_standard.time_write_csv_standard                                                                                                                    25.87ms
[ 30.00%] ··· Running packers.packers_read_csv.time_packers_read_csv                                                                                                                        178.49ms
[ 32.00%] ··· Running packers.packers_write_csv.time_packers_write_csv                                                                                                                      629.38ms
[ 34.00%] ··· Running parser_vb.read_csv_comment2.time_read_csv_comment2                                                                                                                     26.11ms
[ 36.00%] ··· Running parser_vb.read_csv_default_converter.time_read_csv_default_converter                                                                                                    1.98ms
[ 38.00%] ··· Running parser_vb.read_csv_default_converter_python_engine.time_read_csv_default_converter                                                                                      3.35ms
[ 40.00%] ··· Running parser_vb.read_csv_default_converter_with_decimal.time_read_csv_default_converter_with_decimal                                                                          2.01ms
[ 42.00%] ··· Running parser_vb.read_csv_default_converter_with_decimal_python_engine.time_read_csv_default_converter_with_decimal                                                           10.10ms
[ 44.00%] ··· Running parser_vb.read_csv_precise_converter.time_read_csv_precise_converter                                                                                                    1.88ms
[ 46.00%] ··· Running parser_vb.read_csv_roundtrip_converter.time_read_csv_roundtrip_converter                                                                                                2.69ms
[ 48.00%] ··· Running parser_vb.read_csv_thou_vb.time_read_csv_thou_vb                                                                                                                       22.01ms
[ 50.00%] ··· Running parser_vb.read_csv_vb.time_read_csv_vb                                                                                                                                 25.23ms

[ 50.00%] · For pandas commit hash 98c5b88d:
[ 52.00%] ··· Running gil.nogil_read_csv.time_nogil_read_csv                                                                                                                                319.35ms
[ 54.00%] ··· Running gil.nogil_read_csv.time_nogil_read_csv_datetime                                                                                                                       318.53ms
[ 56.00%] ··· Running gil.nogil_read_csv.time_nogil_read_csv_object                                                                                                                          18.57ms
[ 58.00%] ··· Running io_bench.frame_to_csv.time_frame_to_csv                                                                                                                                55.03ms
[ 60.00%] ··· Running io_bench.frame_to_csv2.time_frame_to_csv2                                                                                                                             103.54ms
[ 62.00%] ··· Running io_bench.frame_to_csv_date_formatting.time_frame_to_csv_date_formatting                                                                                                13.35ms
[ 64.00%] ··· Running io_bench.frame_to_csv_mixed.time_frame_to_csv_mixed                                                                                                                   180.89ms
[ 66.00%] ··· Running io_bench.read_csv_from_s3.time_read_nrows                                                                                                                              n/a;...
[ 68.00%] ··· Running io_bench.read_csv_infer_datetime_format_custom.time_read_csv_infer_datetime_format_custom                                                                              11.53ms
[ 70.00%] ··· Running io_bench.read_csv_infer_datetime_format_iso8601.time_read_csv_infer_datetime_format_iso8601                                                                             2.05ms
[ 72.00%] ··· Running io_bench.read_csv_infer_datetime_format_ymd.time_read_csv_infer_datetime_format_ymd                                                                                     2.09ms
[ 74.00%] ··· Running io_bench.read_csv_skiprows.time_read_csv_skiprows                                                                                                                      12.41ms
[ 76.00%] ··· Running io_bench.read_csv_standard.time_read_csv_standard                                                                                                                      11.40ms
[ 78.00%] ··· Running io_bench.write_csv_standard.time_write_csv_standard                                                                                                                    22.19ms
[ 80.00%] ··· Running packers.packers_read_csv.time_packers_read_csv                                                                                                                        175.62ms
[ 82.00%] ··· Running packers.packers_write_csv.time_packers_write_csv                                                                                                                      603.39ms
[ 84.00%] ··· Running parser_vb.read_csv_comment2.time_read_csv_comment2                                                                                                                     25.62ms
[ 86.00%] ··· Running parser_vb.read_csv_default_converter.time_read_csv_default_converter                                                                                                    1.79ms
[ 88.00%] ··· Running parser_vb.read_csv_default_converter_python_engine.time_read_csv_default_converter                                                                                      3.26ms
[ 90.00%] ··· Running parser_vb.read_csv_default_converter_with_decimal.time_read_csv_default_converter_with_decimal                                                                          1.83ms
[ 92.00%] ··· Running parser_vb.read_csv_default_converter_with_decimal_python_engine.time_read_csv_default_converter_with_decimal                                                           10.13ms
[ 94.00%] ··· Running parser_vb.read_csv_precise_converter.time_read_csv_precise_converter                                                                                                    1.74ms
[ 96.00%] ··· Running parser_vb.read_csv_roundtrip_converter.time_read_csv_roundtrip_converter                                                                                                2.44ms
[ 98.00%] ··· Running parser_vb.read_csv_thou_vb.time_read_csv_thou_vb                                                                                                                       21.64ms
[100.00%] ··· Running parser_vb.read_csv_vb.time_read_csv_vb                                                                                                                                 24.61ms

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

jreback · 2016-07-26T12:42:20Z

pandas/io/tests/parser/common.py

@@ -5,6 +5,8 @@
 import platform
 import codecs

+import subprocess
+


remove this

ivannz · 2016-07-26T16:37:58Z

@jorisvandenbossche , could you cancel Travis CI job #21105, please.

jreback · 2016-07-26T22:41:14Z

pandas/io/tests/parser/common.py

+        except ValueError:
+            # Ignore unsuported dtype=object by engine=python
+            # in this case output_ list is empty
+            pass


instead raise nose.SkipTest('....') here

…test

jreback · 2016-07-27T10:49:58Z

thanks!

pijucha · 2016-07-31T04:42:21Z

I tried to compile master branch on Windows VS2015 and it failed with the errors:

pandas/src/parser/tokenizer.c(1260): error C2036: 'void *': unknown size
pandas/src/parser/tokenizer.c(1264): error C2036: 'void *': unknown size

So I guess newptr in these two lines should be cast to char *.

jreback · 2016-07-31T18:39:39Z

can u make a new issue? and fix if u can

pijucha · 2016-07-31T19:06:36Z

OK. I'll later submit the fix.

ivannz mentioned this pull request Jul 25, 2016

Unexpected segmentation fault in pd.read_csv C-engine #13703

Closed

jreback added IO CSV read_csv, to_csv Bug labels Jul 25, 2016

ivannz force-pushed the parser_trim_fix branch from c47684b to aec02e1 Compare July 26, 2016 09:28

ivannz force-pushed the parser_trim_fix branch from aec02e1 to c092c2b Compare July 26, 2016 10:07

jreback reviewed Jul 26, 2016
View reviewed changes

ivannz force-pushed the parser_trim_fix branch from c092c2b to 5489ad6 Compare July 26, 2016 11:47

jreback reviewed Jul 26, 2016
View reviewed changes

pandas/io/tests/parser/common.py

@@ -5,6 +5,8 @@

import platform

import codecs

import subprocess

Copy link

Contributor

jreback Jul 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

jreback reviewed Jul 26, 2016
View reviewed changes

ivannz force-pushed the parser_trim_fix branch from 68b5d37 to d59624e Compare July 26, 2016 23:23

ivannz added 12 commits July 27, 2016 02:25

FIX: 'parser_trim_buffers' properly initializes word pointers

434f1e0

A memory 'stress' test of parser.pyx to cause corruption or segfault

2120719

praser_trim_fix: More stressful test

07b4647

Moved 'parser_trim_buffers' test to its proper place

a831dbb

Expanded the explanation of the patch

5ab3636

Rewritten the 'parser_trim_buffers' test

bdba66f

Updated WHATSNEW with the bug fix information

020d706

flake8 style test correction

e0b4c83

Improved readability of bugfix description; minor style fixes of the …

834c851

…test

Referenced issue in the test, rewrote the bugfix description

629198d

Improved the clarity and logic of the test

9b521f6

Moved the test to 'c_parser_only'

d59624e

jreback closed this in 31f8e4d Jul 27, 2016

jreback added this to the 0.19.0 milestone Jul 27, 2016

ivannz deleted the parser_trim_fix branch July 27, 2016 11:10

This was referenced Jul 28, 2016

Segmentation fault or UnicodeDecodeError when reading csv-file depending on chunksize. #5291

Closed

Segfault in pd.read_csv() using chunksize parameter #11793

Closed

TST: A test to cover fault in issue #5291 #13833

Closed

jreback mentioned this pull request Aug 1, 2016

COMPAT: windows builds breaking after #13788 #13863

Closed

jreback mentioned this pull request Aug 31, 2016

BUG: read_csv() crashes with engine='c' #14125

Closed

bdrosen96 mentioned this pull request Sep 30, 2016

Fix SEGV when using skip lines bdrosen96/pandas#1

Closed

4 tasks

Uh oh!

FIX: 'parser_trim_buffers' properly initializes word pointers #13788

FIX: 'parser_trim_buffers' properly initializes word pointers #13788

Uh oh!

Conversation

ivannz commented Jul 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Output of test.sh:

Uh oh!

jreback commented Jul 25, 2016

Uh oh!

jreback commented Jul 25, 2016

Uh oh!

codecov-io commented Jul 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current coverage is 85.23% (diff: 100%)

Uh oh!

ivannz commented Jul 26, 2016

Uh oh!

ivannz commented Jul 26, 2016

Uh oh!

jorisvandenbossche commented Jul 26, 2016

Uh oh!

jorisvandenbossche commented Jul 26, 2016

Uh oh!

jreback commented Jul 26, 2016

Uh oh!

ivannz commented Jul 26, 2016

Uh oh!

jreback Jul 26, 2016

Choose a reason for hiding this comment

Uh oh!

ivannz commented Jul 26, 2016

Uh oh!

jreback commented Jul 26, 2016

Uh oh!

ivannz commented Jul 26, 2016

Uh oh!

jreback commented Jul 26, 2016

Uh oh!

ivannz commented Jul 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback Jul 26, 2016

Choose a reason for hiding this comment

Uh oh!

ivannz commented Jul 26, 2016

Uh oh!

jreback Jul 26, 2016

Choose a reason for hiding this comment

Uh oh!

jreback commented Jul 27, 2016

Uh oh!

pijucha commented Jul 31, 2016

Uh oh!

jreback commented Jul 31, 2016

Uh oh!

pijucha commented Jul 31, 2016

Uh oh!

Uh oh!

ivannz commented Jul 25, 2016 •

edited

Loading

Output of `test.sh`:

codecov-io commented Jul 25, 2016 •

edited

Loading

ivannz commented Jul 26, 2016 •

edited

Loading