Skip to content

BUG: Fix TypeError caused by GH13374 #17465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 10, 2017
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -411,6 +411,7 @@ I/O
- Bug in :func:`read_csv` when called with a single-element list ``header`` would return a ``DataFrame`` of all NaN values (:issue:`7757`)
- Bug in :func:`read_stata` where value labels could not be read when using an iterator (:issue:`16923`)
- Bug in :func:`read_html` where import check fails when run in multiple threads (:issue:`16928`)
- Bug in :func:`read_csv` where automatic delimiter detection caused a ``TypeError`` to be thrown when a bad line was encountered rather than the correct error message (:issue:`13374`)

Plotting
^^^^^^^^
Expand Down
3 changes: 2 additions & 1 deletion pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2836,7 +2836,8 @@ def _rows_to_cols(self, content):
for row_num, actual_len in bad_lines:
msg = ('Expected %d fields in line %d, saw %d' %
(col_len, row_num + 1, actual_len))
if len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:
if self.delimiter and \
len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:
Copy link
Member

@gfyoung gfyoung Sep 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally practice for is to use parentheses around the conditionals and not to use the slash for something a little nicer to read i.e.:

if (self.delimiter and
    len(self.delimiter) > 1...)

# see gh-13374
reason = ('Error could possibly be due to quotes being '
'ignored when a multi-char delimiter is used.')
Expand Down
19 changes: 19 additions & 0 deletions pandas/tests/io/parser/python_parser_only.py
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,25 @@ def test_multi_char_sep_quotes(self):
self.read_csv(StringIO(data), sep=',,',
quoting=csv.QUOTE_NONE)

def test_none_delimiter(self):
# see gh-13374 and gh-17465
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a 1-line comment about what is happening here.

Is this only in the python parser as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the Python parser. I actually discovered the issue because I was using the built in CSV sniffer and the C parser for a while, but switched to the python engine with the pandas sniffer because it did notably better for the data files I was using.


data = "a,b,c\n0,1,2\n3,4,5,6\n7,8,9"
expected = DataFrame({'a': [0, 7],
'b': [1, 8],
'c': [2, 9]})

# We expect the third line in the data to be
# skipped because it is malformed
# but we do not expect any errors to occur
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits: add a comma after "malformed" + add a period at end of comment.

result = self.read_csv(StringIO(data), header=0,
sep=None,
error_bad_lines=False,
warn_bad_lines=True,
engine='python',
tupleize_cols=True)
tm.assert_frame_equal(result, expected)

def test_skipfooter_bad_row(self):
# see gh-13879
# see gh-15910
Expand Down