Skip to content

first line comments on a read_csv #4623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hayd opened this issue Aug 21, 2013 · 5 comments
Closed

first line comments on a read_csv #4623

hayd opened this issue Aug 21, 2013 · 5 comments
Labels
Bug IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@hayd
Copy link
Contributor

hayd commented Aug 21, 2013

related #4505

It seems that commenting on the first line is a little buggy (or perhaps not well-defined):

In [11]: s1 = '# notes\na,b,c\n# more notes\n1,2,3'

In [12]: s2 = 'a,b,c\n# more notes\n1,2,3'

In [13]: pd.read_csv(StringIO(s1), comment='#')
Out[13]: 
        Unnamed: 0
a   b            c
NaN NaN        NaN
1   2            3

In [14]: pd.read_csv(StringIO(s2), comment='#')
Out[14]: 
    a   b   c
0 NaN NaN NaN
1   1   2   3

If you ignore the header:

In [15]: pd.read_csv(StringIO(s1), comment='#', header=None)
CParserError: Error tokenizing data. C error: Expected 1 fields in line 2, saw 3

related #3001 and from this SO question.

@jreback
Copy link
Contributor

jreback commented Aug 21, 2013

see also #4505

@jreback
Copy link
Contributor

jreback commented Aug 21, 2013

of course in the header is special issue

@hayd
Copy link
Contributor Author

hayd commented Aug 21, 2013

yeah, now I think about it, the first behaviour is ok. Only ignoring header is bug.

@gfyoung
Copy link
Member

gfyoung commented Aug 2, 2016

I don't think this is an issue anymore:

>>> from pandas import read_csv
>>> from pandas.compat import StringIO
>>> data = '# notes\na,b,c\n# more notes\n1,2,3'
>>> read_csv(StringIO(data), engine='c', comment='#')
   a  b  c
0  1  2  3
>>> read_csv(StringIO(data), engine='python', comment='#')
   a  b  c
0  1  2  3
>>> read_csv(StringIO(data), engine='c', comment='#', header=None)
   0  1  2
0  a  b  c
1  1  2  3
>>> read_csv(StringIO(data), engine='python', comment='#', header=None)
   0  1  2
0  a  b  c
1  1  2  3

@jreback
Copy link
Contributor

jreback commented Aug 2, 2016

ok, this looks closable, can you put some tests.

@jreback jreback modified the milestones: 0.19.0, Next Major Release Aug 2, 2016
gfyoung added a commit to forking-repos/pandas that referenced this issue Aug 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

3 participants