-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Throw a ParserError when header rows have unequal column counts … #43118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @quantumalaviya! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-09-05 08:27:17 UTC |
… Updated to comply with PEP8 (GH43102)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls always start with tests. these should fail w/o the code change and pass after.
@jreback I added a test inside pandas/tests/io/parser/test_header.py. Is this fine? |
pandas/io/parsers/base_parser.py
Outdated
@@ -341,6 +341,14 @@ def _extract_multi_indexer_columns( | |||
# extract the columns | |||
field_count = len(header[0]) | |||
|
|||
# check if header lengths are equal | |||
for header_iter in range(len(header)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do this like all(len(x) == len(ls[0]) for x in ls[1:])
has the downside that the element where the error was found is not clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you do suggest removing that part of the error? It can simply say "Header rows must have an equal number of columns."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I would go with that.
match="Header rows must have equal number of columns. " | ||
"Mismatch found at header 1.", | ||
): | ||
parser.read_csv(StringIO(case), sep=",", header=[0, 2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sep is the default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I will get rid of it, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add whatsnew
I had added "A Parser error is raised when the header row had unequal columns. Outputs the first row where a mismatch is found." as a substitute for whatsnew. Do you want it to be more detailed or am I missing something? |
Please add something about the uncaught error before to avoid the impression that a new error is raised |
Please be patient, we are all volunteers and will look when we have time. Whatsnew is still missing |
Thanks for the clarification, I was confused as to what the next step is. EDIT: Changed whatsnew. |
doc/source/whatsnew/v1.4.0.rst
Outdated
@@ -319,7 +319,7 @@ I/O | |||
- Bug in :func:`json_normalize` where ``errors=ignore`` could fail to ignore missing values of ``meta`` when ``record_path`` has a length greater than one (:issue:`41876`) | |||
- Bug in :func:`read_csv` with multi-header input and arguments referencing column names as tuples (:issue:`42446`) | |||
- Bug in :func:`Series.to_json` and :func:`DataFrame.to_json` where some attributes were skipped when serialising plain Python objects to JSON (:issue:`42768`, :issue:`33043`) | |||
- | |||
- Bug in :func:`read_csv` where reading multi-header input with unequal lengths incorrectly raises an ``IndexError`` (:issue:`43102`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug in :func:read_csv
where reading multi-header input with unequal lengths incorrectly raising uncontrolled IndexError
(:issue:43102
)
Yes the whatsnews are our release notes. Small comment, otherwise lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you merge master and skip the test for the pyarrow engine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. This LGTM.
@lithomas1 merge away |
thanks @quantumalaviya |
My pleasure. Thanks for guiding me through it! |
pandas-dev#43118) * BUG: Throw a ParserError when header rows have unequal column counts (GH43102) * BUG: Throw a ParserError when header rows have unequal column counts. Updated to comply with PEP8 (GH43102) * Added Test. (GH43102) * Added Test. (GH43102) * Added Test. (GH43102) * Added Changes. (GH43102) * Added whatsnew * Added whatsnew * Test without whatsnew * Add whatsnew again * Update v1.4.0.rst * Merge upstream * Skipping test on PyArrow
(GH43102)
i
inextract(r)
inside base_parser.py exceeds the length of a header row (whenfield_count > len(r)
).