-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
read_csv parse issues with \r line ending and quoted items #3453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
And testing this out with the latest from github gives me the same issues >>> pd.__version__
'0.12.0.dev-1e2b447'
>>> import StringIO
>>> pd.read_csv(StringIO.StringIO(' a,b,c\r"a,b","e,d","f,f"'), header=None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 401, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 216, in _read
return parser.read()
File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 634, in read
ret = self._engine.read(nrows)
File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 958, in read
data = self._reader.read(nrows)
File "parser.pyx", line 654, in pandas._parser.TextReader.read (pandas/src/parser.c:6014)
File "parser.pyx", line 676, in pandas._parser.TextReader._read_low_memory (pandas/src/parser.c:6231)
File "parser.pyx", line 729, in pandas._parser.TextReader._read_rows (pandas/src/parser.c:6833)
File "parser.pyx", line 716, in pandas._parser.TextReader._tokenize_rows (pandas/src/parser.c:6718)
File "parser.pyx", line 1582, in pandas._parser.raise_parser_error (pandas/src/parser.c:17131)
pandas._parser.CParserError: Error tokenizing data. C error: Expected 3 fields in line 2, saw 4
>>> pd.read_csv(StringIO.StringIO(' a,b,c\r"a,b","e,d","f,f"'))
a b c
"a b" e,d f,f
>>> pd.read_csv(StringIO.StringIO(' a,b,c\r"a,b","e,d","f,f"'), index_col=False)
a b c
0 "a b" e,d |
And also confirming that this error occurs in >>> pd.__version__
'0.11.0'
>>> pd.read_csv(StringIO.StringIO(' a,b,c\r"a,b","e,d","f,f"'), header=None)
pd.read_csv(StringIO.StringIO(' a,b,c\r"a,b","e,d","f,f"'), header=None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 401, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 216, in _read
return parser.read()
File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 633, in read
ret = self._engine.read(nrows)
File "/home/john/app/venv/lib/python2.7/site-packages/pandas/io/parsers.py", line 957, in read
data = self._reader.read(nrows)
File "parser.pyx", line 654, in pandas._parser.TextReader.read (pandas/src/parser.c:5921)
File "parser.pyx", line 676, in pandas._parser.TextReader._read_low_memory (pandas/src/parser.c:6138)
File "parser.pyx", line 729, in pandas._parser.TextReader._read_rows (pandas/src/parser.c:6740)
File "parser.pyx", line 716, in pandas._parser.TextReader._tokenize_rows (pandas/src/parser.c:6625)
File "parser.pyx", line 1582, in pandas._parser.raise_parser_error (pandas/src/parser.c:17029)
pandas._parser.CParserError: Error tokenizing data. C error: Expected 3 fields in line 2, saw 4 |
This was referenced May 1, 2013
Looking |
All set-- had to make a bit of a mess. We will need to clean up the tokenizer loop one of these days (being mindful of performance of course)
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There seems to be an issue with quotes containing the separator in read_csv
EXPECTED BEHAVIOR:
This should have the same behavior as when the line ending is
\n
Maybe this should be in a separate bug report, but a possibly related issue occurs when you don't say
header=None
The above shows the first quoted-delimited item set as the
index_col
. The following shows what happens when we tell pandas to useindex_col=False
EXPECTED BEHAVIOR:
and with index_col=False
Here is my system information if that is necessary
The text was updated successfully, but these errors were encountered: