BUG: Throw a ParserError when header rows have unequal column counts … #43118

quantumalaviya · 2021-08-20T05:28:07Z

(GH43102)

closes BUG: IndexError when header rows have unequal column counts #43102
Fixed uncaught IndexError that is raised when the iterator i in extract(r) inside base_parser.py exceeds the length of a header row (when field_count > len(r)).
A Parser error is raised when the header row had unequal columns. Outputs the first row where a mismatch is found.
Added a test to match the error being raised.

…(GH43102)

pep8speaks · 2021-08-20T05:28:09Z

Hello @quantumalaviya! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-09-05 08:27:17 UTC

… Updated to comply with PEP8 (GH43102)

jreback

pls always start with tests. these should fail w/o the code change and pass after.

quantumalaviya · 2021-08-20T17:21:18Z

@jreback I added a test inside pandas/tests/io/parser/test_header.py. Is this fine?

phofl · 2021-08-20T17:25:53Z

pandas/io/parsers/base_parser.py

@@ -341,6 +341,14 @@ def _extract_multi_indexer_columns(
        # extract the columns
        field_count = len(header[0])

+        # check if header lengths are equal
+        for header_iter in range(len(header)):


You could do this like all(len(x) == len(ls[0]) for x in ls[1:])

has the downside that the element where the error was found is not clear

So you do suggest removing that part of the error? It can simply say "Header rows must have an equal number of columns."

Yeah I would go with that.

phofl · 2021-08-20T17:27:24Z

pandas/tests/io/parser/test_header.py

+        match="Header rows must have equal number of columns. "
+        "Mismatch found at header 1.",
+    ):
+        parser.read_csv(StringIO(case), sep=",", header=[0, 2])


sep is the default

True, I will get rid of it, thanks

phofl

Please add whatsnew

quantumalaviya · 2021-08-20T17:36:47Z

I had added "A Parser error is raised when the header row had unequal columns. Outputs the first row where a mismatch is found." as a substitute for whatsnew. Do you want it to be more detailed or am I missing something?

phofl · 2021-08-20T17:49:43Z

Please add something about the uncaught error before to avoid the impression that a new error is raised

quantumalaviya · 2021-08-21T18:28:18Z

@jreback @phofl Does this look good?

phofl · 2021-08-21T21:55:49Z

Please be patient, we are all volunteers and will look when we have time.

Whatsnew is still missing

quantumalaviya · 2021-08-22T05:53:12Z

Thanks for the clarification, I was confused as to what the next step is.

EDIT: Changed whatsnew.

phofl · 2021-08-22T18:45:41Z

doc/source/whatsnew/v1.4.0.rst

@@ -319,7 +319,7 @@ I/O
 - Bug in :func:`json_normalize` where ``errors=ignore`` could fail to ignore missing values of ``meta`` when ``record_path`` has a length greater than one (:issue:`41876`)
 - Bug in :func:`read_csv` with multi-header input and arguments referencing column names as tuples (:issue:`42446`)
 - Bug in :func:`Series.to_json` and :func:`DataFrame.to_json` where some attributes were skipped when serialising plain Python objects to JSON (:issue:`42768`, :issue:`33043`)
-
+- Bug in :func:`read_csv` where reading multi-header input with unequal lengths incorrectly raises an ``IndexError`` (:issue:`43102`)


Bug in :func:read_csv where reading multi-header input with unequal lengths incorrectly raising uncontrolled IndexError (:issue:43102)

phofl · 2021-08-22T18:46:28Z

Yes the whatsnews are our release notes. Small comment, otherwise lgtm

phofl

lgtm

lithomas1

Can you merge master and skip the test for the pyarrow engine?

lithomas1

Thanks for the PR. This LGTM.

jreback · 2021-09-05T15:36:33Z

@lithomas1 merge away

lithomas1 · 2021-09-05T16:15:18Z

thanks @quantumalaviya

quantumalaviya · 2021-09-05T17:39:58Z

My pleasure. Thanks for guiding me through it!

pandas-dev#43118) * BUG: Throw a ParserError when header rows have unequal column counts (GH43102) * BUG: Throw a ParserError when header rows have unequal column counts. Updated to comply with PEP8 (GH43102) * Added Test. (GH43102) * Added Test. (GH43102) * Added Test. (GH43102) * Added Changes. (GH43102) * Added whatsnew * Added whatsnew * Test without whatsnew * Add whatsnew again * Update v1.4.0.rst * Merge upstream * Skipping test on PyArrow

BUG: Throw a ParserError when header rows have unequal column counts …

53a1768

…(GH43102)

BUG: Throw a ParserError when header rows have unequal column counts.…

128b4e3

… Updated to comply with PEP8 (GH43102)

quantumalaviya closed this Aug 20, 2021

quantumalaviya reopened this Aug 20, 2021

jreback requested changes Aug 20, 2021

View reviewed changes

jreback added the IO CSV read_csv, to_csv label Aug 20, 2021

quantumalaviya added 5 commits August 20, 2021 19:15

Merge remote-tracking branch 'upstream/master' into b43102

6483df7

Merge remote-tracking branch 'upstream/master' into b43102

bec4d00

Added Test. (GH43102)

95bac98

Added Test. (GH43102)

10422a8

Added Test. (GH43102)

658c291

quantumalaviya closed this Aug 20, 2021

quantumalaviya reopened this Aug 20, 2021

quantumalaviya closed this Aug 20, 2021

quantumalaviya reopened this Aug 20, 2021

quantumalaviya requested a review from jreback August 20, 2021 17:18

phofl reviewed Aug 20, 2021

View reviewed changes

phofl requested changes Aug 20, 2021

View reviewed changes

Added Changes. (GH43102)

a02d476

quantumalaviya requested a review from phofl August 20, 2021 18:43

quantumalaviya added 2 commits August 21, 2021 15:23

Merge remote-tracking branch 'upstream/master' into b43102

87bf9a2

Merge remote-tracking branch 'upstream/master' into b43102

f1b1a89

phofl reviewed Aug 22, 2021

View reviewed changes

quantumalaviya closed this Aug 22, 2021

quantumalaviya reopened this Aug 22, 2021

Added whatsnew

5239ece

quantumalaviya requested a review from phofl August 22, 2021 20:48

phofl approved these changes Aug 22, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/master' into b43102

0dbc0cc

quantumalaviya closed this Aug 23, 2021

quantumalaviya reopened this Aug 23, 2021

quantumalaviya added 2 commits August 24, 2021 00:16

Test without whatsnew

863e996

Add whatsnew again

532e6cb

jreback added the Error Reporting Incorrect or improved errors from pandas label Aug 31, 2021

jreback added this to the 1.4 milestone Aug 31, 2021

jreback approved these changes Aug 31, 2021

View reviewed changes

Merge branch 'master' into b43102

2533bd1

lithomas1 requested changes Sep 5, 2021

View reviewed changes

quantumalaviya and others added 5 commits September 5, 2021 09:28

Merged upstream

7d01f0a

Update v1.4.0.rst

1caf42d

Merged upstream

921b57d

Merge upstream

2ca6ccf

Skipping test on PyArrow

3f1fb39

quantumalaviya requested a review from lithomas1 September 5, 2021 11:09

lithomas1 approved these changes Sep 5, 2021

View reviewed changes

jreback approved these changes Sep 5, 2021

View reviewed changes

lithomas1 merged commit 343ac2a into pandas-dev:master Sep 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Throw a ParserError when header rows have unequal column counts … #43118

BUG: Throw a ParserError when header rows have unequal column counts … #43118

quantumalaviya commented Aug 20, 2021 •

edited

Loading

pep8speaks commented Aug 20, 2021 •

edited

Loading

jreback left a comment

quantumalaviya commented Aug 20, 2021

phofl Aug 20, 2021 •

edited

Loading

quantumalaviya Aug 20, 2021

phofl Aug 20, 2021

phofl Aug 20, 2021

quantumalaviya Aug 20, 2021

phofl left a comment

quantumalaviya commented Aug 20, 2021

phofl commented Aug 20, 2021

quantumalaviya commented Aug 21, 2021

phofl commented Aug 21, 2021

quantumalaviya commented Aug 22, 2021 •

edited

Loading

phofl Aug 22, 2021

phofl commented Aug 22, 2021

phofl left a comment

lithomas1 left a comment

lithomas1 left a comment

jreback commented Sep 5, 2021

lithomas1 commented Sep 5, 2021

quantumalaviya commented Sep 5, 2021

BUG: Throw a ParserError when header rows have unequal column counts … #43118

BUG: Throw a ParserError when header rows have unequal column counts … #43118

Conversation

quantumalaviya commented Aug 20, 2021 • edited Loading

pep8speaks commented Aug 20, 2021 • edited Loading

Comment last updated at 2021-09-05 08:27:17 UTC

jreback left a comment

Choose a reason for hiding this comment

quantumalaviya commented Aug 20, 2021

phofl Aug 20, 2021 • edited Loading

Choose a reason for hiding this comment

quantumalaviya Aug 20, 2021

Choose a reason for hiding this comment

phofl Aug 20, 2021

Choose a reason for hiding this comment

phofl Aug 20, 2021

Choose a reason for hiding this comment

quantumalaviya Aug 20, 2021

Choose a reason for hiding this comment

phofl left a comment

Choose a reason for hiding this comment

quantumalaviya commented Aug 20, 2021

phofl commented Aug 20, 2021

quantumalaviya commented Aug 21, 2021

phofl commented Aug 21, 2021

quantumalaviya commented Aug 22, 2021 • edited Loading

phofl Aug 22, 2021

Choose a reason for hiding this comment

phofl commented Aug 22, 2021

phofl left a comment

Choose a reason for hiding this comment

lithomas1 left a comment

Choose a reason for hiding this comment

lithomas1 left a comment

Choose a reason for hiding this comment

jreback commented Sep 5, 2021

lithomas1 commented Sep 5, 2021

quantumalaviya commented Sep 5, 2021

quantumalaviya commented Aug 20, 2021 •

edited

Loading

pep8speaks commented Aug 20, 2021 •

edited

Loading

phofl Aug 20, 2021 •

edited

Loading

quantumalaviya commented Aug 22, 2021 •

edited

Loading