BUG: Python Parser skipping over items if BOM present in first element of header #36365

asishm · 2020-09-14T17:44:48Z

closes BUG: ZERO WIDTH NO-BREAK SPACE in column name causes a reading failure #36343
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

WillAyd

Thanks for taking a look. Can you also add a whatsnew note for 1.2?

WillAyd · 2020-09-15T15:56:09Z

pandas/io/parsers.py

@@ -2876,7 +2876,7 @@ def _check_for_bom(self, first_row):
            return [new_row] + first_row[1:]

        elif len(first_row_bom) > 1:
-            return [first_row_bom[1:]]
+            return [first_row_bom[1:]] + first_row[1:]


Hmm I don't find this very clear why we would do this - can you try refactoring the code above this to better suit the requirement?

from what I understand
the code block above removes the BOM from the first element of the row and removes the quotes if it was a quoted string. i.e. if it was of the format <BOM><QUOTE>abc<QUOTE>def then it extracts out abcdef

I've tried to refactor it slightly

…ser_bom

jreback · 2020-09-19T03:21:12Z

pandas/tests/io/parser/test_common.py

@@ -2127,6 +2127,12 @@ def test_first_row_bom(all_parsers):
    expected = DataFrame(columns=["Head1", "Head2", "Head3"])
    tm.assert_frame_equal(result, expected)

+    # see gh-36343


can you make this a new test

done, I think test failures are unrelated. https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=42772&view=ms.vss-test-web.build-test-results-tab&runId=2173848&resultId=150762&paneView=debug

jreback · 2020-09-19T03:22:07Z

@gfyoung if you'd have a look

gfyoung · 2020-09-19T04:16:44Z

@jreback's comments notwithstanding, this looks good. Nice catch @asishm !

WillAyd · 2020-09-19T15:22:34Z

/azp run

azure-pipelines · 2020-09-19T15:22:43Z

Azure Pipelines successfully started running 1 pipeline(s).

WillAyd

lgtm when green

jreback · 2020-09-19T20:14:30Z

thanks @asishm
nice. I think we might have some other BOM/EOM issues in the tracker if you'd have a look (and maybe this will close them).

asishm · 2020-09-19T21:57:18Z

@jreback

I searched for 'bom' and only found #31609 which can be closed - it talks about the same solution as this PR. not really sure what else to search for. (eom doesn't lead to anything)

jreback · 2020-09-19T22:02:38Z

k thanks @asishm yeah there might be some others but IIRC those are for the c-parser

…t of header (pandas-dev#36365)

asishm added 2 commits September 14, 2020 13:38

add failing test

77b9cdc

bugfix + black

3ab5760

WillAyd requested changes Sep 15, 2020

View reviewed changes

WillAyd added the IO CSV read_csv, to_csv label Sep 15, 2020

asishm added 3 commits September 18, 2020 09:20

Merge branch 'master' of github.com:pandas-dev/pandas into python_par…

a38538b

…ser_bom

slight refactor

66bccdd

add whatsnew

72c9b2c

asishm requested a review from WillAyd September 18, 2020 19:10

jreback added the Bug label Sep 19, 2020

jreback requested changes Sep 19, 2020

View reviewed changes

make separate test case

891e8d4

WillAyd approved these changes Sep 19, 2020

View reviewed changes

jreback added this to the 1.2 milestone Sep 19, 2020

jreback approved these changes Sep 19, 2020

View reviewed changes

jreback merged commit a90d559 into pandas-dev:master Sep 19, 2020

jreback mentioned this pull request Sep 19, 2020

_check_for_bom at parsers.py remove all the fields behind BOM #31609

Closed

asishm deleted the python_parser_bom branch September 20, 2020 03:35

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

BUG: Python Parser skipping over items if BOM present in first elemen…

f88b228

…t of header (pandas-dev#36365)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Python Parser skipping over items if BOM present in first element of header #36365

BUG: Python Parser skipping over items if BOM present in first element of header #36365

asishm commented Sep 14, 2020 •

edited

Loading

WillAyd left a comment

WillAyd Sep 15, 2020

asishm Sep 18, 2020

jreback Sep 19, 2020

asishm Sep 19, 2020

jreback commented Sep 19, 2020

gfyoung commented Sep 19, 2020

WillAyd commented Sep 19, 2020

azure-pipelines bot commented Sep 19, 2020

WillAyd left a comment

jreback commented Sep 19, 2020

asishm commented Sep 19, 2020

jreback commented Sep 19, 2020

BUG: Python Parser skipping over items if BOM present in first element of header #36365

BUG: Python Parser skipping over items if BOM present in first element of header #36365

Conversation

asishm commented Sep 14, 2020 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd Sep 15, 2020

Choose a reason for hiding this comment

asishm Sep 18, 2020

Choose a reason for hiding this comment

jreback Sep 19, 2020

Choose a reason for hiding this comment

asishm Sep 19, 2020

Choose a reason for hiding this comment

jreback commented Sep 19, 2020

gfyoung commented Sep 19, 2020

WillAyd commented Sep 19, 2020

azure-pipelines bot commented Sep 19, 2020

WillAyd left a comment

Choose a reason for hiding this comment

jreback commented Sep 19, 2020

asishm commented Sep 19, 2020

jreback commented Sep 19, 2020

asishm commented Sep 14, 2020 •

edited

Loading