Skip to content

BUG: Improve error message for skipfooter malformed rows in Python engine #14749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.19.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Bug Fixes
- Bug in ``pd.read_csv`` where reading files fails, if the number of headers is equal to the number of lines in the file (:issue:`14515`)
- Bug in ``pd.read_csv`` for the Python engine in which an unhelpful error message was being raised when multi-char delimiters were not being respected with quotes (:issue:`14582`)
- Fix bugs (:issue:`14734`, :issue:`13654`) in ``pd.read_sas`` and ``pandas.io.sas.sas7bdat.SAS7BDATReader`` that caused problems when reading a SAS file incrementally.
- Bug in ``pd.read_csv`` for the Python engine in which an unhelpful error message was being raised when ``skipfooter`` was not being respected by Python's CSV library (:issue:`13879`)



Expand Down
23 changes: 16 additions & 7 deletions pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2411,14 +2411,23 @@ def _next_line(self):
try:
orig_line = next(self.data)
except csv.Error as e:
msg = str(e)

if 'NULL byte' in str(e):
raise csv.Error(
'NULL byte detected. This byte '
'cannot be processed in Python\'s '
'native csv library at the moment, '
'so please pass in engine=\'c\' instead.')
else:
raise
msg = ('NULL byte detected. This byte '
'cannot be processed in Python\'s '
'native csv library at the moment, '
'so please pass in engine=\'c\' instead')

if self.skipfooter > 0:
reason = ('Error could possibly be due to '
'parsing errors in the skipped footer rows '
'(the skipfooter keyword is only applied '
'after Python\'s csv library has parsed '
'all rows).')
msg += '. ' + reason

raise csv.Error(msg)
line = self._check_comments([orig_line])[0]
self.pos += 1
if (not self.skip_blank_lines and
Expand Down
15 changes: 15 additions & 0 deletions pandas/io/tests/parser/python_parser_only.py
Original file line number Diff line number Diff line change
Expand Up @@ -221,3 +221,18 @@ def test_multi_char_sep_quotes(self):
with tm.assertRaisesRegexp(ValueError, msg):
self.read_csv(StringIO(data), sep=',,',
quoting=csv.QUOTE_NONE)

def test_skipfooter_bad_row(self):
# see gh-13879

data = 'a,b,c\ncat,foo,bar\ndog,foo,"baz'
msg = 'parsing errors in the skipped footer rows'

with tm.assertRaisesRegexp(csv.Error, msg):
self.read_csv(StringIO(data), skipfooter=1)

# We expect no match, so there should be an assertion
# error out of the inner context manager.
with tm.assertRaises(AssertionError):
with tm.assertRaisesRegexp(csv.Error, msg):
self.read_csv(StringIO(data))