Skip to content

pandas.read_csv(..., skiprows=2, engine='c') with unix-style line breaks crashes python on windows #11020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
e-pet opened this issue Sep 7, 2015 · 5 comments · Fixed by #30674
Labels
good first issue IO CSV read_csv, to_csv Needs Tests Unit test(s) needed to prevent regressions Segfault Non-Recoverable Error Windows Windows OS
Milestone

Comments

@e-pet
Copy link

e-pet commented Sep 7, 2015

The following makes python crash ("Kernel died, restarting" in IPython) on my windows 7 machine:

import pandas
myfile = open("test.csv", "w", newline="\n")
myfile.write("blah\n\ncol_1,col_2,col_3\n\n")
myfile.close()
dat = pandas.read_csv("test.csv", skiprows=2, encoding="utf-8", engine="c")

Note the unix-style line breaks.

The test case seems to be pretty precise, since changing about anything leads to working code. I tried, e.g.,

  • with Windows-style line breaks ('\r\n' instead of '\n')
  • without the two initial lines and the skiprows parameter
  • with two empty initial lines
  • with just one initial line, containing text
  • with the 'python' engine,

and everything worked.

Here is the output of pandas.show_versions():

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.15.2
nose: 1.3.4
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.6.7
lxml: 3.4.2
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.9
pymysql: None
psycopg2: None
@jreback
Copy link
Contributor

jreback commented Sep 8, 2015

works on master, might have been #5664 fixed in 0.16.1

In [1]: import pandas
In [2]: myfile = open("test.csv", "w", newline="\n")
In [3]: myfile.write("blah\n\ncol_1,col_2,col_3\n\n")
Out[3]: 25
In [4]: myfile.close()
In [5]: dat = pandas.read_csv("test.csv", skiprows=2, encoding="utf-8", engine="c")
In [6]: dat
Out[6]:
Empty DataFrame
Columns: [col_1, col_2, col_3]
Index: []

In [7]: pd.__version__
Out[7]: '0.16.2+601.g76a4d99

@jreback jreback closed this as completed Sep 8, 2015
@jreback
Copy link
Contributor

jreback commented Sep 8, 2015

if you'd like to add this as a test case, would take a PR in any event.

@jreback jreback reopened this Sep 8, 2015
@jreback jreback added Testing pandas testing functions or related to the test suite IO CSV read_csv, to_csv Windows Windows OS labels Sep 8, 2015
@jreback jreback added this to the Next Major Release milestone Sep 8, 2015
@e-pet
Copy link
Author

e-pet commented Sep 8, 2015

Sorry, I'd like to contribute a test case (and in fact just wrote one), but I don't have the time to set up the build process right now, which seems to be necessary for running tests. So I can't check whether my test actually fails on the version I'm using.

@jreback
Copy link
Contributor

jreback commented Sep 8, 2015

ok, contributing docs are here: http://pandas.pydata.org/pandas-docs/stable/contributing.html

its actually quite easy with conda

@sudheesh001
Copy link

Hey, Could I take a shot at this? Does the test for this have to go into io.parser.c_parser_only?
I can add a file with unix endings into the data directory and assign it to a variable in the setup_method of test_parsers along with the other csv files.
The contents of the file are as follows:

Test

col_1,col_2,col_3

Reading the file returns the columns with skiprows=2 as col_1, col_2, col_3. Should the assertion for the test here be the length being 3 and the contents?

@mroeschke mroeschke added Needs Tests Unit test(s) needed to prevent regressions and removed Effort Low IO CSV read_csv, to_csv Testing pandas testing functions or related to the test suite Windows Windows OS labels Oct 7, 2019
@jbrockmendel jbrockmendel added IO CSV read_csv, to_csv Segfault Non-Recoverable Error Windows Windows OS labels Oct 16, 2019
@jreback jreback modified the milestones: Contributions Welcome, 1.0 Jan 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue IO CSV read_csv, to_csv Needs Tests Unit test(s) needed to prevent regressions Segfault Non-Recoverable Error Windows Windows OS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants