Skip to content

Commit d827c83

Browse files
authored
BUG: read_csv raising error for python engine when names longer than header but equal to data (#44654)
1 parent be83376 commit d827c83

File tree

3 files changed

+23
-1
lines changed

3 files changed

+23
-1
lines changed

doc/source/whatsnew/v1.4.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -668,6 +668,7 @@ I/O
668668
- Bug in :func:`read_csv` with :code:`float_precision="round_trip"` which did not skip initial/trailing whitespace (:issue:`43713`)
669669
- Bug in :func:`read_csv` not applying dtype for ``index_col`` (:issue:`9435`)
670670
- Bug in dumping/loading a :class:`DataFrame` with ``yaml.dump(frame)`` (:issue:`42748`)
671+
- Bug in :func:`read_csv` raising ``ValueError`` when names was longer than header but equal to data rows for ``engine="python"`` (:issue:`38453`)
671672
- Bug in :class:`ExcelWriter`, where ``engine_kwargs`` were not passed through to all engines (:issue:`43442`)
672673
- Bug in :func:`read_csv` raising ``ValueError`` when ``parse_dates`` was used with ``MultiIndex`` columns (:issue:`8991`)
673674
- Bug in :func:`read_csv` converting columns to numeric after date parsing failed (:issue:`11019`)

pandas/io/parsers/python_parser.py

+9-1
Original file line numberDiff line numberDiff line change
@@ -448,7 +448,15 @@ def _infer_columns(self):
448448
self._clear_buffer()
449449

450450
if names is not None:
451-
if len(names) > len(columns[0]):
451+
# Read first row after header to check if data are longer
452+
try:
453+
first_line = self._next_line()
454+
except StopIteration:
455+
first_line = None
456+
457+
len_first_data_row = 0 if first_line is None else len(first_line)
458+
459+
if len(names) > len(columns[0]) and len(names) > len_first_data_row:
452460
raise ValueError(
453461
"Number of passed names did not match "
454462
"number of header fields in the file"

pandas/tests/io/parser/test_header.py

+13
Original file line numberDiff line numberDiff line change
@@ -574,6 +574,19 @@ def test_multi_index_unnamed(all_parsers, index_col, columns):
574574
tm.assert_frame_equal(result, expected)
575575

576576

577+
@skip_pyarrow
578+
def test_names_longer_than_header_but_equal_with_data_rows(all_parsers):
579+
# GH#38453
580+
parser = all_parsers
581+
data = """a, b
582+
1,2,3
583+
5,6,4
584+
"""
585+
result = parser.read_csv(StringIO(data), header=0, names=["A", "B", "C"])
586+
expected = DataFrame({"A": [1, 5], "B": [2, 6], "C": [3, 4]})
587+
tm.assert_frame_equal(result, expected)
588+
589+
577590
@skip_pyarrow
578591
def test_read_csv_multiindex_columns(all_parsers):
579592
# GH#6051

0 commit comments

Comments
 (0)