Skip to content

BUG: read_csv() crashes with engine='c' #14125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jzwinck opened this issue Aug 31, 2016 · 2 comments
Closed

BUG: read_csv() crashes with engine='c' #14125

jzwinck opened this issue Aug 31, 2016 · 2 comments
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@jzwinck
Copy link
Contributor

jzwinck commented Aug 31, 2016

Here's the code (input data is at the end of this message):

pd.read_csv('foo.csv', header=None, usecols=[0])

It fails with:

File "pandas/parser.pyx", line 805, in pandas.parser.TextReader.read (pandas/parser.c:8748)
File "pandas/parser.pyx", line 827, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:9003)
File "pandas/parser.pyx", line 881, in pandas.parser.TextReader._read_rows (pandas/parser.c:9731)
File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)
File "pandas/parser.pyx", line 1865, in pandas.parser.raise_parser_error (pandas/parser.c:23325)
pandas.io.common.CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

Small perturbations of the input file (adding or removing characters) makes it work, as does engine='python'. Note that while one row (or more) of the file contains "extra" columns, I have only asked Pandas to read column 0, which it should well be able to do since that column has a consistent, short length.

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-85-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 25.1.6
Cython: 0.24.1
numpy: 1.11.1

DATA

1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,1111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1,1,111111111,XXXXX,X,XXXXX XXX,X.X.,X.X.,X.X., , ,11111111,11111111,11111111,X.X.,XXX111XXXXX1,X.X.,X.X.,11,XX_XXXX,1,XX_XXX,1111 XX,XX_XXXX,XXXXXXX XXXX,XX_X1_XX_XXXX,111111,XX_XXXX,XXXXXXXXXXXX XXXXXX XXXXXXXX,XX_XXX_XXX,11111.111111,XX_XXXXXX_XXX,1,XX_XXXX,11111.111111,XX_XXXX_XXXX,1,XX_XX,1.111111,XX_XX_XXXX, ,XX_1XXX,1.111111,XX_1XXX_XXXX, ,XX_XXXX,1,XX_1X_XXXX,X.X.,XX_XXXX_XXXXX_XXXXXXX,X.X.,XX_XXX_XXXXXXX,X.X.,XX_XXX_XXXX1,1.111111,XX_XXX_XXXXXX,111.111,XX_XXXXXXXXX1,X.X.,XX_XXXX,1,XX_XXXXX,XXX,XX_XXXX_XXX,X.X.,XX_XXXXXXXXX_XXXX,X.X.,XX_XXX_XXX_XXX,X.X.,XX_X1XXXXXX_XXX,X.X.,XX_XX_XXXXXXXXXX,X.X.,XX_X1XXXXXX,X.X.,XXX,1111 XX,XX_XXX_X1_X_XXXXXX,XXX,XX_XXX_X1_XX_XXXXXX_XXXXXXX,XXX111XXXXX1,XX_XXX_X1_XX_XXX_XXX_1XX,1111,XX_XXX_XXX1_XXXXXX,XX,XX_XXXXXX_XXX,1111 XX,XX_XXXXXX_X1_XX_XXXXXX,1X11,XX_XXXXXX_X1_XX_XXXXXX_XXXXXXX,XXX111XXXX11,XX_XXXXXX_X1_XX_XXX_XXX_1XX,1111,XX_XXXXXX_XXX1_XXXXXX,XX,XX_XXXXX,XXXX XXX'X: XXXXXXX XX1XXXX XXXXXXX XXXX.,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,1111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
@jreback
Copy link
Contributor

jreback commented Aug 31, 2016

this was likely fixed by: #13788

can you try.

@jreback jreback added the IO CSV read_csv, to_csv label Aug 31, 2016
@jorisvandenbossche
Copy link
Member

This still gives an error for me with latest master.

@jorisvandenbossche jorisvandenbossche added this to the Next Major Release milestone Nov 23, 2016
jeffcarey added a commit to jeffcarey/pandas that referenced this issue Nov 30, 2016
jeffcarey added a commit to jeffcarey/pandas that referenced this issue Jan 23, 2017
jeffcarey added a commit to jeffcarey/pandas that referenced this issue Jan 23, 2017
jeffcarey added a commit to jeffcarey/pandas that referenced this issue Jan 23, 2017
BUG: Fixed incorrect stream size check (pandas-dev#14125)

Fixed after merge

flake differences
jeffcarey added a commit to jeffcarey/pandas that referenced this issue Jan 23, 2017
BUG: Fixed incorrect stream size check (pandas-dev#14125)

Fixed after merge

flake differences

Fixed char pointer spacing
jeffcarey added a commit to jeffcarey/pandas that referenced this issue Jan 23, 2017
jeffcarey added a commit to jeffcarey/pandas that referenced this issue Jan 23, 2017
jeffcarey added a commit to jeffcarey/pandas that referenced this issue Jan 23, 2017
jeffcarey added a commit to jeffcarey/pandas that referenced this issue Jan 23, 2017
jeffcarey added a commit to jeffcarey/pandas that referenced this issue Jan 24, 2017
@jreback jreback modified the milestones: 0.20.0, Next Major Release Jan 24, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
closes pandas-dev#14125

Previously, self->stream_cap was copied into a local variable called
maxstreamsize each time tokenize_bytes ran, and then this was checked
in the PUSH_CHAR macro. However, there is one other place in the file
where function make_stream_space() is called (in end_line()), and when
this happens self->stream_cap is increased but maxstreamsize is not
updated, making the check incorrect. In rare circumstances (see
original issue or test case) this could cause a crash. The resolution
is to just check self->stream_cap directly.

Author: Jeff Carey <[email protected]>

Closes pandas-dev#15195 from jeffcarey/fix/14125 and squashes the following commits:

d3c5f28 [Jeff Carey] BUG: Fixed incorrect stream size check (pandas-dev#14125)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants