BUG: Fix TypeError caused by GH13374 #17465

matthax · 2017-09-07T16:59:51Z

closes BUG: Python parser breaks with quotes and multi-char sep #13374
0 failed, 9873 passed, 1955 skipped, 11 xfailed, 4 warnings in 1475.44 seconds

added test_none_delimiter in pandas/tests/io/parser/python_parser_only.py
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Bug in :func:read_csv where automatic delimiter detection caused a TypeError to be thrown when a bad line was encountered rather than the correct error message (:issue:13374)

@gfyoung

pep8speaks · 2017-09-07T16:59:54Z

Hello @matthax! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 09, 2017 at 21:49 Hours UTC

codecov · 2017-09-07T23:14:06Z

Codecov Report

Merging #17465 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17465      +/-   ##
==========================================
- Coverage   91.16%   91.14%   -0.02%     
==========================================
  Files         163      163              
  Lines       49590    49590              
==========================================
- Hits        45209    45200       -9     
- Misses       4381     4390       +9

Flag	Coverage Δ
#multiple	`88.93% <100%> (ø)`	⬆️
#single	`40.25% <0%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.46% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.72% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee6185e...ced6fc6. Read the comment docs.

codecov · 2017-09-07T23:14:20Z

Codecov Report

Merging #17465 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17465      +/-   ##
==========================================
- Coverage   91.15%   91.13%   -0.02%     
==========================================
  Files         163      163              
  Lines       49534    49534              
==========================================
- Hits        45153    45144       -9     
- Misses       4381     4390       +9

Flag	Coverage Δ
#multiple	`88.92% <100%> (ø)`	⬆️
#single	`40.22% <0%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.46% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.72% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fdbc6b8...f151405. Read the comment docs.

jreback · 2017-09-08T00:52:25Z

pls add your example: #13374 (comment) as a test.

jreback · 2017-09-08T00:52:52Z

doc/source/whatsnew/v0.21.0.txt

@@ -411,6 +411,7 @@ I/O
 - Bug in :func:`read_csv` when called with a single-element list ``header`` would return a ``DataFrame`` of all NaN values (:issue:`7757`)
 - Bug in :func:`read_stata` where value labels could not be read when using an iterator (:issue:`16923`)
 - Bug in :func:`read_html` where import check fails when run in multiple threads (:issue:`16928`)
+- Bug in :func:`read_csv` where automatic delimiter detection caused a `TypeError` to be thrown when a bad line was encountered (:issue:`13374`)


use double-back-ticks around TypeError

add:

rather than the correct error message.

updated both. Looks like you're going to have to manually restart the travis build though, something went wrong there and I'm not able to see any log info.

…thon_parser_patch merge upstream into master

matthax · 2017-09-08T14:09:52Z

@jreback added the test and made the requested changes in whatsnew

jreback · 2017-09-09T15:59:01Z

pandas/tests/io/parser/python_parser_only.py

@@ -218,6 +218,23 @@ def test_multi_char_sep_quotes(self):
                self.read_csv(StringIO(data), sep=',,',
                              quoting=csv.QUOTE_NONE)

+    def test_none_delimiter(self):
+        # see gh-13374 and gh-17465


can you add a 1-line comment about what is happening here.

Is this only in the python parser as well?

Only the Python parser. I actually discovered the issue because I was using the built in CSV sniffer and the C parser for a while, but switched to the python engine with the pandas sniffer because it did notably better for the data files I was using.

jreback · 2017-09-09T15:59:25Z

@gfyoung pls have a look.

matthax

updated the documentation for the test method. Couldn't find a nice way to one-line it though

jreback · 2017-09-09T17:49:48Z

lgtm. @gfyoung pls review and merge.

gfyoung · 2017-09-09T17:53:04Z

pandas/tests/io/parser/python_parser_only.py

+
+        # We expect the third line in the data to be
+        # skipped because it is malformed
+        # but we do not expect any errors to occur


Nits: add a comma after "malformed" + add a period at end of comment.

gfyoung · 2017-09-09T17:53:52Z

pandas/io/parsers.py

@@ -2836,7 +2836,8 @@ def _rows_to_cols(self, content):
            for row_num, actual_len in bad_lines:
                msg = ('Expected %d fields in line %d, saw %d' %
                       (col_len, row_num + 1, actual_len))
-                if len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:
+                if self.delimiter and \
+                   len(self.delimiter) > 1 and self.quoting != csv.QUOTE_NONE:


Generally practice for is to use parentheses around the conditionals and not to use the slash for something a little nicer to read i.e.:

if (self.delimiter and len(self.delimiter) > 1...)

…thon_parser_patch merge from upstream

matthax · 2017-09-09T21:51:18Z

Just and FYI the flake check doesn't seem to be working on windows. I tried altering the commands as suggested but no dice. It doesn't error but it doesn't show me any issues. I swapped to using pylint and that seems to work ok, but perhaps that's something that needs to be looked into.

edit running it through the git-bash shows nothing either

gfyoung · 2017-09-09T22:50:18Z

Just and FYI the flake check doesn't seem to be working on windows.

Hmmm...we've been having this issue off-and-on with Windows. Not sure why it wouldn't have been caught on the diff.

gfyoung · 2017-09-10T07:30:55Z

Thanks @matthax !

MBark201 added 2 commits September 7, 2017 12:44

Fixes a bug introduced by #13374

edb9337

BUG: Fixed bug caused by GH13374

8d9ac4b

gfyoung added Bug IO CSV read_csv, to_csv labels Sep 7, 2017

CLN: PEP8 for GH17465

ced6fc6

jreback requested changes Sep 8, 2017

View reviewed changes

MBark201 added 2 commits September 8, 2017 08:43

Merge branch 'master' of https://github.com/pandas-dev/pandas into py…

4b3d5de

…thon_parser_patch merge upstream into master

TST: Added test for gh-13374

4745d6d

TST: remove iterator option from test

6896727

jreback reviewed Sep 9, 2017

View reviewed changes

TST: document test for gh-17465

5a9ee56

matthax commented Sep 9, 2017

View reviewed changes

jreback added this to the 0.21.0 milestone Sep 9, 2017

jreback approved these changes Sep 9, 2017

View reviewed changes

gfyoung reviewed Sep 9, 2017

View reviewed changes

gfyoung self-assigned this Sep 9, 2017

matthax added 3 commits September 9, 2017 17:42

Merge branch 'master' of https://github.com/pandas-dev/pandas into py…

e9e5e97

…thon_parser_patch merge from upstream

CLN: change code and comment to meet guidelines

a24b96f

CLN: fix code format for pep8

f151405

gfyoung approved these changes Sep 9, 2017

View reviewed changes

gfyoung removed their assignment Sep 10, 2017

gfyoung merged commit 23050dc into pandas-dev:master Sep 10, 2017

matthax deleted the python_parser_patch branch September 10, 2017 12:48

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

BUG: Fix TypeError caused by GH13374 (pandas-dev#17465)

486712e

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

BUG: Fix TypeError caused by GH13374 (pandas-dev#17465)

addb858

Uh oh!

BUG: Fix TypeError caused by GH13374 #17465

BUG: Fix TypeError caused by GH13374 #17465

Uh oh!

Conversation

matthax commented Sep 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Sep 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on September 09, 2017 at 21:49 Hours UTC

Uh oh!

codecov bot commented Sep 7, 2017

Codecov Report

Uh oh!

codecov bot commented Sep 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jreback commented Sep 8, 2017

Uh oh!

jreback Sep 8, 2017

Choose a reason for hiding this comment

Uh oh!

jreback Sep 8, 2017

Choose a reason for hiding this comment

Uh oh!

matthax Sep 9, 2017

Choose a reason for hiding this comment

Uh oh!

matthax commented Sep 8, 2017

Uh oh!

jreback Sep 9, 2017

Choose a reason for hiding this comment

Uh oh!

matthax Sep 9, 2017

Choose a reason for hiding this comment

Uh oh!

jreback commented Sep 9, 2017

Uh oh!

matthax left a comment

Choose a reason for hiding this comment

Uh oh!

jreback commented Sep 9, 2017

Uh oh!

gfyoung Sep 9, 2017

Choose a reason for hiding this comment

Uh oh!

gfyoung Sep 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthax commented Sep 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfyoung commented Sep 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfyoung commented Sep 10, 2017

Uh oh!

Uh oh!

matthax commented Sep 7, 2017 •

edited

Loading

pep8speaks commented Sep 7, 2017 •

edited

Loading

codecov bot commented Sep 7, 2017 •

edited

Loading

gfyoung Sep 9, 2017 •

edited

Loading

matthax commented Sep 9, 2017 •

edited

Loading

gfyoung commented Sep 9, 2017 •

edited

Loading