Skip to content

Bug: read_csv does not work with chr(254) as quotechar parameter #11592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Nov 13, 2015 · 5 comments
Closed

Bug: read_csv does not work with chr(254) as quotechar parameter #11592

ghost opened this issue Nov 13, 2015 · 5 comments
Labels
Bug Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Milestone

Comments

@ghost
Copy link

ghost commented Nov 13, 2015

No description provided.

@ghost ghost changed the title Bug: Read CSV does not accept chr(254) as quotechar parameter Bug: read_csv does not accept chr(254) as quotechar parameter Nov 13, 2015
@ghost ghost changed the title Bug: read_csv does not accept chr(254) as quotechar parameter Bug: read_csv does not work with chr(254) as quotechar parameter Nov 13, 2015
@jreback
Copy link
Contributor

jreback commented Nov 13, 2015

pls show a copy-pastable example

@jreback jreback added the IO CSV read_csv, to_csv label Nov 18, 2015
@dsm054
Copy link
Contributor

dsm054 commented Aug 18, 2016

I can reproduce:

>>> pd.read_csv(io.StringIO(chr(127)+"a, "+chr(127)+",b"), quotechar=chr(127), header=None)

     0  1
0  a,   b

>>> pd.read_csv(io.StringIO(chr(254)+"a, "+chr(254)+",b"), quotechar=chr(254), header=None)

    0   1  2
0  þa   þ  b

The problem sets in at chr(128), which just so happens to be the first value when len(chr(x).encode("utf-8")) > 1, and we're using only char quotechar in parser.pyx.

I suspect this would be a bit of a headache to fix, and it would only really be useful for people who decided to use a small thorn as a separator, and who were also using every other viable separator character so that a pass to replace the thorns wouldn't work, and who can't use the python engine for some reason.

>>> pd.read_csv(io.StringIO(chr(254)+"a, "+chr(254)+",b"), quotechar=chr(254), 
header=None, engine='python')

     0  1
0  a,   b

I like the idea of being able to use anything anywhere, but in terms of developer time, maybe we should just raise NotimplementedError if you're using the C engine in these circumstances..

@jreback jreback added the Bug label Aug 18, 2016
@jreback jreback added this to the Next Major Release milestone Aug 18, 2016
@jreback
Copy link
Contributor

jreback commented Aug 18, 2016

thanks @dsm054 for the example.

@gfyoung
Copy link
Member

gfyoung commented Jan 3, 2017

@jreback : We have a similar issue with sep in which we cannot use multi-char sep with the C engine. I'm inclined to agree with @joelschw and raise NotImplementedError (or some kind of error) for such a quotechar for the time being. Thoughts?

@jreback
Copy link
Contributor

jreback commented Jan 3, 2017

multi-char sep for sure should be NotImplementedError (when c-engine is given). so the same here is fine.

@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Jan 3, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Jan 4, 2017
Raise ValueError or issue ParserWarning
when a multi-char quotechar is passed in,
and the C engine is used.

Closes pandas-devgh-11592.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jan 4, 2017
Raise ValueError or issue ParserWarning
when a multi-char quotechar is passed in,
and the C engine is used.

Closes pandas-devgh-11592.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jan 4, 2017
Raise ValueError or issue ParserWarning
when a multi-char quotechar is passed in,
and the C engine is used.

Closes pandas-devgh-11592.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jan 5, 2017
Raise ValueError or issue ParserWarning
when a multi-char quotechar is passed in,
and the C engine is used.

Closes pandas-devgh-11592.
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.20.0, Next Major Release Jan 5, 2017
jorisvandenbossche pushed a commit that referenced this issue Jan 5, 2017
Raise ValueError or issue ParserWarning
when a multi-char quotechar is passed in,
and the C engine is used.

Closes gh-11592.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

4 participants