Skip to content

Commit 343ac2a

Browse files
BUG: Throw a ParserError when header rows have unequal column counts … (#43118)
* BUG: Throw a ParserError when header rows have unequal column counts (GH43102) * BUG: Throw a ParserError when header rows have unequal column counts. Updated to comply with PEP8 (GH43102) * Added Test. (GH43102) * Added Test. (GH43102) * Added Test. (GH43102) * Added Changes. (GH43102) * Added whatsnew * Added whatsnew * Test without whatsnew * Add whatsnew again * Update v1.4.0.rst * Merge upstream * Skipping test on PyArrow
1 parent 35d52ff commit 343ac2a

File tree

3 files changed

+21
-1
lines changed

3 files changed

+21
-1
lines changed

doc/source/whatsnew/v1.4.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -384,7 +384,7 @@ I/O
384384
- Bug in :func:`Series.to_json` and :func:`DataFrame.to_json` where some attributes were skipped when serialising plain Python objects to JSON (:issue:`42768`, :issue:`33043`)
385385
- Column headers are dropped when constructing a :class:`DataFrame` from a sqlalchemy's ``Row`` object (:issue:`40682`)
386386
- Bug in unpickling a :class:`Index` with object dtype incorrectly inferring numeric dtypes (:issue:`43188`)
387-
-
387+
- Bug in :func:`read_csv` where reading multi-header input with unequal lengths incorrectly raising uncontrolled ``IndexError`` (:issue:`43102`)
388388

389389
Period
390390
^^^^^^

pandas/io/parsers/base_parser.py

+4
Original file line numberDiff line numberDiff line change
@@ -343,6 +343,10 @@ def _extract_multi_indexer_columns(
343343
# extract the columns
344344
field_count = len(header[0])
345345

346+
# check if header lengths are equal
347+
if not all(len(header_iter) == field_count for header_iter in header[1:]):
348+
raise ParserError("Header rows must have an equal number of columns.")
349+
346350
def extract(r):
347351
return tuple(r[i] for i in range(field_count) if i not in sic)
348352

pandas/tests/io/parser/test_header.py

+16
Original file line numberDiff line numberDiff line change
@@ -604,3 +604,19 @@ def test_read_csv_multiindex_columns(all_parsers):
604604
tm.assert_frame_equal(df1, expected.iloc[:1])
605605
df2 = parser.read_csv(StringIO(s2), header=[0, 1])
606606
tm.assert_frame_equal(df2, expected)
607+
608+
609+
@skip_pyarrow
610+
def test_read_csv_multi_header_length_check(all_parsers):
611+
# GH#43102
612+
parser = all_parsers
613+
614+
case = """row11,row12,row13
615+
row21,row22, row23
616+
row31,row32
617+
"""
618+
619+
with pytest.raises(
620+
ParserError, match="Header rows must have an equal number of columns."
621+
):
622+
parser.read_csv(StringIO(case), header=[0, 2])

0 commit comments

Comments
 (0)