Skip to content

read_csv incompatible with newstr and future #14477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
larssono opened this issue Oct 23, 2016 · 8 comments
Closed

read_csv incompatible with newstr and future #14477

larssono opened this issue Oct 23, 2016 · 8 comments
Labels
IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@larssono
Copy link

larssono commented Oct 23, 2016

When upgrading the pandas-0.19 I have several tests failing on a package I maintain. These packages are using several imports from future to work with both py2 and py3. It seems there is an issue with using from __future__ import unicode_literals

A small, complete example of the issue

import pandas as pd
pd.read_csv('simple.txt', quotechar='"')
from __future__ import unicode_literals
pd.read_csv('simple.txt', quotechar='"')

The first reading works the second does not and throws the stack trace attached. ("TypeError: "quotechar" must be string, not unicode")
The example file
simple.txt

Expected Output

Output of pd.show_versions()

## INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.0
nose: 1.3.7
pip: 8.1.2
setuptools: 26.0.0
Cython: None
numpy: 1.11.2
scipy: 0.16.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.42.0
pandas_datareader: None

TypeError                                 Traceback (most recent call last)
<ipython-input-2-6e275a5a7598> in <module>()
      1 from __future__ import unicode_literals
----> 2 pd.read_csv('/Users/lom/simple.csv', quotechar='"')

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    643                     skip_blank_lines=skip_blank_lines)
    644 
--> 645         return _read(filepath_or_buffer, kwds)
    646 
    647     parser_f.__name__ = name

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
    386 
    387     # Create the parser.
--> 388     parser = TextFileReader(filepath_or_buffer, **kwds)
    389 
    390     if (nrows is not None) and (chunksize is not None):

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)
    727             self.options['has_index_names'] = kwds['has_index_names']
    728 
--> 729         self._make_engine(self.engine)
    730 
    731     def close(self):

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
    920     def _make_engine(self, engine='c'):
    921         if engine == 'c':
--> 922             self._engine = CParserWrapper(self.f, **self.options)
    923         else:
    924             if engine == 'python':

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, src, **kwds)
   1387         kwds['allow_leading_cols'] = self.index_col is not False
   1388 
-> 1389         self._reader = _parser.TextReader(src, **kwds)
   1390 
   1391         # XXX

pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4411)()

pandas/parser.pyx in pandas.parser.TextReader._set_quoting (pandas/parser.c:6535)()

TypeError: "quotechar" must be string, not unicode
@jorisvandenbossche jorisvandenbossche added IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version labels Oct 24, 2016
@jorisvandenbossche jorisvandenbossche added this to the 0.19.1 milestone Oct 24, 2016
@jorisvandenbossche
Copy link
Member

@larssono Thanks for the report!

cc @gfyoung

@gfyoung
Copy link
Member

gfyoung commented Oct 24, 2016

@jorisvandenbossche : Might it be best to just add a unicode class to pandas.compat? I think that should patch this issue IINM i.e.

try:
    unicode
except NameError:
    unicode = str

@gfyoung
Copy link
Member

gfyoung commented Oct 24, 2016

FYI, for future reference, here's a slightly easier way to reproduce (Note: Python 2.x required):

>>> from pandas import read_csv
>>> from pandas.compat import StringIO, u
>>>
>>> data = 'a\n1'
>>> read_csv(StringIO(data), quotechar=u('"'))
...
TypeError: "quotechar" must be string, not unicode

@jreback
Copy link
Contributor

jreback commented Oct 24, 2016

@gfyoung unicode needs to be very explicit

@gfyoung
Copy link
Member

gfyoung commented Oct 24, 2016

@jreback : Right...but what do you think of the patch I proposed above, and we can then add the class to the allowed string types in parser.pyx?

@jreback
Copy link
Contributor

jreback commented Oct 24, 2016

well it's not explicit
so -1

@gfyoung
Copy link
Member

gfyoung commented Oct 24, 2016

In pandas.compat:

try:
    unicode
except NameError:
    unicode = str
...

In parser.pyx:

if not isinstance(quote_char, (str, bytes, compat.unicode)) and quote_char is not None:
...

gfyoung added a commit to forking-repos/pandas that referenced this issue Oct 25, 2016
gfyoung added a commit to forking-repos/pandas that referenced this issue Oct 25, 2016
gfyoung added a commit to forking-repos/pandas that referenced this issue Oct 25, 2016
gfyoung added a commit to forking-repos/pandas that referenced this issue Oct 25, 2016
gfyoung added a commit to forking-repos/pandas that referenced this issue Oct 26, 2016
gfyoung added a commit to forking-repos/pandas that referenced this issue Oct 26, 2016
jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this issue Nov 2, 2016
…d.read_csv

Title is self-explanatory.  Affects Python 2.x only.  Closes pandas-dev#14477.

Author: gfyoung <[email protected]>

Closes pandas-dev#14492 from gfyoung/quotechar-unicode-2.x and squashes the following commits:

ec9f59a [gfyoung] BUG: Accept unicode quotechars again in pd.read_csv

(cherry picked from commit 6130e77)
@streamnsight
Copy link

having a similar problem with 'escapechar'

"escapechar" must be string, not unicode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

5 participants