Skip to content

Commit d8e427b

Browse files
gfyoungjorisvandenbossche
authored andcommitted
BUG: Improve error message for multi-char sep and quotes in Python engine (#14582)
If there is a field counts mismatch, check whether a multi-char sep was used in conjunction with quotes. Currently, that setup is not respected and can result in improper line breaks. Closes gh-13374.
1 parent b1d9599 commit d8e427b

File tree

3 files changed

+23
-0
lines changed

3 files changed

+23
-0
lines changed

doc/source/whatsnew/v0.19.2.txt

+1
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Bug Fixes
3030
- Compat with ``dateutil==2.6.0``; segfault reported in the testing suite (:issue:`14621`)
3131
- Allow ``nanoseconds`` in ``Timestamp.replace`` as a kwarg (:issue:`14621`)
3232
- Bug in ``pd.read_csv`` where reading files fails, if the number of headers is equal to the number of lines in the file (:issue:`14515`)
33+
- Bug in ``pd.read_csv`` for the Python engine in which an unhelpful error message was being raised when multi-char delimiters were not being respected with quotes (:issue:`14582`)
3334

3435

3536

pandas/io/parsers.py

+5
Original file line numberDiff line numberDiff line change
@@ -2515,6 +2515,11 @@ def _rows_to_cols(self, content):
25152515

25162516
msg = ('Expected %d fields in line %d, saw %d' %
25172517
(col_len, row_num + 1, zip_len))
2518+
if len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:
2519+
# see gh-13374
2520+
reason = ('Error could possibly be due to quotes being '
2521+
'ignored when a multi-char delimiter is used.')
2522+
msg += '. ' + reason
25182523
raise ValueError(msg)
25192524

25202525
if self.usecols:

pandas/io/tests/parser/python_parser_only.py

+17
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
arguments when parsing.
88
"""
99

10+
import csv
1011
import sys
1112
import nose
1213

@@ -204,3 +205,19 @@ def test_encoding_non_utf8_multichar_sep(self):
204205
sep=sep, names=['a', 'b'],
205206
encoding=encoding)
206207
tm.assert_frame_equal(result, expected)
208+
209+
def test_multi_char_sep_quotes(self):
210+
# see gh-13374
211+
212+
data = 'a,,b\n1,,a\n2,,"2,,b"'
213+
msg = 'ignored when a multi-char delimiter is used'
214+
215+
with tm.assertRaisesRegexp(ValueError, msg):
216+
self.read_csv(StringIO(data), sep=',,')
217+
218+
# We expect no match, so there should be an assertion
219+
# error out of the inner context manager.
220+
with tm.assertRaises(AssertionError):
221+
with tm.assertRaisesRegexp(ValueError, msg):
222+
self.read_csv(StringIO(data), sep=',,',
223+
quoting=csv.QUOTE_NONE)

0 commit comments

Comments
 (0)