Skip to content

Bug: names of multiindex columns not set correctly when index col is not first column #44931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -754,6 +754,7 @@ I/O
- Bug in :func:`read_csv` raising ``AttributeError`` when attempting to read a .csv file and infer index column dtype from an nullable integer type (:issue:`44079`)
- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` with ``compression`` set to ``'zip'`` no longer create a zip file containing a file ending with ".zip". Instead, they try to infer the inner file name more smartly. (:issue:`39465`)
- Bug in :func:`read_csv` when passing simultaneously a parser in ``date_parser`` and ``parse_dates=False``, the parsing was still called (:issue:`44366`)
- Bug in :func:`read_csv` not setting name of :class:`MultiIndex` columns correctly when ``index_col`` is not the first column (:issue:`38549`)
- Bug in :func:`read_csv` silently ignoring errors when failling to create a memory-mapped file (:issue:`44766`)
- Bug in :func:`read_csv` when passing a ``tempfile.SpooledTemporaryFile`` opened in binary mode (:issue:`44748`)
-
Expand Down
8 changes: 6 additions & 2 deletions pandas/io/parsers/base_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -391,7 +391,9 @@ def extract(r):
return tuple(r[i] for i in range(field_count) if i not in sic)

columns = list(zip(*(extract(r) for r in header)))
names = ic + columns
names = columns.copy()
for single_ic in sorted(ic):
names.insert(single_ic, single_ic)

# If we find unnamed columns all in a single
# level, then our header was too long.
Expand All @@ -406,7 +408,9 @@ def extract(r):
# Clean the column names (if we have an index_col).
if len(ic):
col_names = [
r[0] if ((r[0] is not None) and r[0] not in self.unnamed_cols) else None
r[ic[0]]
if ((r[ic[0]] is not None) and r[ic[0]] not in self.unnamed_cols)
else None
for r in header
]
else:
Expand Down
20 changes: 20 additions & 0 deletions pandas/tests/io/parser/test_index_col.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,3 +332,23 @@ def test_specify_dtype_for_index_col(all_parsers, dtype, val):
result = parser.read_csv(StringIO(data), index_col="a", dtype={"a": dtype})
expected = DataFrame({"b": [2]}, index=Index([val], name="a"))
tm.assert_frame_equal(result, expected)


@skip_pyarrow
def test_multiindex_columns_not_leading_index_col(all_parsers):
# GH#38549
parser = all_parsers
data = """a,b,c,d
e,f,g,h
x,y,1,2
"""
result = parser.read_csv(
StringIO(data),
header=[0, 1],
index_col=1,
)
cols = MultiIndex.from_tuples(
[("a", "e"), ("c", "g"), ("d", "h")], names=["b", "f"]
)
expected = DataFrame([["x", 1, 2]], columns=cols, index=["y"])
tm.assert_frame_equal(result, expected)