Skip to content

DOC: read_csv() ignores quotes when a regex is used in sep #11989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mfixman opened this issue Jan 7, 2016 · 2 comments
Closed

DOC: read_csv() ignores quotes when a regex is used in sep #11989

mfixman opened this issue Jan 7, 2016 · 2 comments
Labels
Docs IO CSV read_csv, to_csv
Milestone

Comments

@mfixman
Copy link

mfixman commented Jan 7, 2016

When using a regular expression in the sep argument of read_csv, the Python parser disregards quotes in the input file.

In the following example, example.csv is parsed correctly by read_csv without regexes in sep, while the version with regexes (which should evaluate to exactly the same as the previous version) fails because it parses the commas inside the quotes.

example2.csv, which doesn't contain quotes, is parsed correctly using the same code.

In [1]: import pandas

In [2]: !cat example.csv
a,b,c,d
q,w,e,r
a,s,d,f
"z,x,c,v",i,o,p

In [3]: pandas.read_csv('example.csv', engine = 'python', quotechar = '"', sep = ',')
Out[3]: 
         a  b  c  d
0        q  w  e  r
1        a  s  d  f
2  z,x,c,v  i  o  p

In [4]: pandas.read_csv('example.csv', engine = 'python', quotechar = '"', sep = '[,]')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-5f9ad4fdcd46> in <module>()
----> 1 pandas.read_csv('example.csv', engine = 'python', quotechar = '"', sep = '[,]')

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
    489                     skip_blank_lines=skip_blank_lines)
    490 
--> 491         return _read(filepath_or_buffer, kwds)
    492 
    493     parser_f.__name__ = name

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
    276         return parser
    277 
--> 278     return parser.read()
    279 
    280 _parser_defaults = {

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in read(self, nrows)
    738                 raise ValueError('skip_footer not supported for iteration')
    739 
--> 740         ret = self._engine.read(nrows)
    741 
    742         if self.options.get('as_recarray'):

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in read(self, rows)
   1593             content = content[1:]
   1594 
-> 1595         alldata = self._rows_to_cols(content)
   1596         data = self._exclude_implicit_index(alldata)
   1597 

/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.pyc in _rows_to_cols(self, content)
   1968             msg = ('Expected %d fields in line %d, saw %d' %
   1969                    (col_len, row_num + 1, zip_len))
-> 1970             raise ValueError(msg)
   1971 
   1972         if self.usecols:

ValueError: Expected 4 fields in line 4, saw 7

In [5]: !cat example2.csv
a,b,c,d
q,w,e,r
a,s,d,f
z,x,c,v

In [6]: pandas.read_csv('example2.csv', engine = 'python', quotechar = '"', sep = '[,]')
Out[6]: 
   a  b  c  d
0  q  w  e  r
1  a  s  d  f
2  z  x  c  v

In [7]: pandas.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-74-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.0
nose: 1.3.1
pip: 1.5.4
setuptools: 1.1.4
Cython: None
numpy: 1.10.1
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 3.1.0
sphinx: None
patsy: 0.2.1
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
lxml: None
bs4: 4.2.1
html5lib: 0.999
httplib2: 0.8
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
@jreback
Copy link
Contributor

jreback commented Jan 7, 2016

quotechar AND a regex sep are not handled. I suppose you could make a doc-note.

@jreback jreback added Docs IO CSV read_csv, to_csv labels Jan 7, 2016
@jreback jreback added this to the Someday milestone Jan 7, 2016
@jreback jreback changed the title BUG: read_csv() ignores quotes when a regex is used in sep DOC: read_csv() ignores quotes when a regex is used in sep Jan 7, 2016
@jreback jreback modified the milestones: 0.18.0, Someday Jan 16, 2016
@jreback jreback modified the milestones: Next Major Release, 0.18.0 Jan 30, 2016
@jreback
Copy link
Contributor

jreback commented Jan 30, 2016

closed by #12059

@jreback jreback closed this as completed Jan 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

2 participants