Skip to content

Commit 428d442

Browse files
phoflyehoshuadimarsky
authored andcommitted
BUG: read_csv ignoring non existing header rows for python engine (pandas-dev#47493)
* BUG: read_csv ignoring non existing header rows for python engine * Rename variable
1 parent cf62427 commit 428d442

File tree

3 files changed

+22
-3
lines changed

3 files changed

+22
-3
lines changed

doc/source/whatsnew/v1.5.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -877,6 +877,7 @@ I/O
877877
- Bug in :func:`read_csv` interpreting second row as :class:`Index` names even when ``index_col=False`` (:issue:`46569`)
878878
- Bug in :func:`read_parquet` when ``engine="pyarrow"`` which caused partial write to disk when column of unsupported datatype was passed (:issue:`44914`)
879879
- Bug in :func:`DataFrame.to_excel` and :class:`ExcelWriter` would raise when writing an empty DataFrame to a ``.ods`` file (:issue:`45793`)
880+
- Bug in :func:`read_csv` ignoring non-existing header row for ``engine="python"`` (:issue:`47400`)
880881
- Bug in :func:`read_excel` raising uncontrolled ``IndexError`` when ``header`` references non-existing rows (:issue:`43143`)
881882
- Bug in :func:`read_html` where elements surrounding ``<br>`` were joined without a space between them (:issue:`29528`)
882883
- Bug in :func:`read_csv` when data is longer than header leading to issues with callables in ``usecols`` expecting strings (:issue:`46997`)

pandas/io/parsers/python_parser.py

+9-3
Original file line numberDiff line numberDiff line change
@@ -379,10 +379,16 @@ def _infer_columns(
379379
line = self._next_line()
380380

381381
except StopIteration as err:
382-
if self.line_pos < hr:
382+
if 0 < self.line_pos <= hr and (
383+
not have_mi_columns or hr != header[-1]
384+
):
385+
# If no rows we want to raise a different message and if
386+
# we have mi columns, the last line is not part of the header
387+
joi = list(map(str, header[:-1] if have_mi_columns else header))
388+
msg = f"[{','.join(joi)}], len of {len(joi)}, "
383389
raise ValueError(
384-
f"Passed header={hr} but only {self.line_pos + 1} lines in "
385-
"file"
390+
f"Passed header={msg}"
391+
f"but only {self.line_pos} lines in file"
386392
) from err
387393

388394
# We have an empty file, so check

pandas/tests/io/parser/test_header.py

+12
Original file line numberDiff line numberDiff line change
@@ -666,3 +666,15 @@ def test_header_none_and_on_bad_lines_skip(all_parsers):
666666
)
667667
expected = DataFrame({"a": ["x", "z"], "b": [1, 3]})
668668
tm.assert_frame_equal(result, expected)
669+
670+
671+
@skip_pyarrow
672+
def test_header_missing_rows(all_parsers):
673+
# GH#47400
674+
parser = all_parsers
675+
data = """a,b
676+
1,2
677+
"""
678+
msg = r"Passed header=\[0,1,2\], len of 3, but only 2 lines in file"
679+
with pytest.raises(ValueError, match=msg):
680+
parser.read_csv(StringIO(data), header=[0, 1, 2])

0 commit comments

Comments
 (0)