Bug: names of multiindex columns not set correctly when index col is not first column (#44931)

phofl · web-flow · commit 50adb83b4ccb · 2021-12-17T19:00:57.000-05:00
diff --git a/doc/source/whatsnew/v1.4.0.rst b/doc/source/whatsnew/v1.4.0.rst
@@ -761,6 +761,7 @@ I/O
 - :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` with ``compression`` set to ``'zip'`` no longer create a zip file containing a file ending with ".zip". Instead, they try to infer the inner file name more smartly. (:issue:`39465`)
 - Bug in :func:`read_csv` where reading a mixed column of booleans and missing values to a float type results in the missing values becoming 1.0 rather than NaN (:issue:`42808`, :issue:`34120`)
 - Bug in :func:`read_csv` when passing simultaneously a parser in ``date_parser`` and ``parse_dates=False``, the parsing was still called (:issue:`44366`)
+- Bug in :func:`read_csv` not setting name of :class:`MultiIndex` columns correctly when ``index_col`` is not the first column (:issue:`38549`)
 - Bug in :func:`read_csv` silently ignoring errors when failling to create a memory-mapped file (:issue:`44766`)
 - Bug in :func:`read_csv` when passing a ``tempfile.SpooledTemporaryFile`` opened in binary mode (:issue:`44748`)
 -
diff --git a/pandas/io/parsers/base_parser.py b/pandas/io/parsers/base_parser.py
@@ -391,7 +391,9 @@ def extract(r):
             return tuple(r[i] for i in range(field_count) if i not in sic)
 
         columns = list(zip(*(extract(r) for r in header)))
-        names = ic + columns
+        names = columns.copy()
+        for single_ic in sorted(ic):
+            names.insert(single_ic, single_ic)
 
         # If we find unnamed columns all in a single
         # level, then our header was too long.
@@ -406,7 +408,9 @@ def extract(r):
         # Clean the column names (if we have an index_col).
         if len(ic):
             col_names = [
-                r[0] if ((r[0] is not None) and r[0] not in self.unnamed_cols) else None
+                r[ic[0]]
+                if ((r[ic[0]] is not None) and r[ic[0]] not in self.unnamed_cols)
+                else None
                 for r in header
             ]
         else:
diff --git a/pandas/tests/io/parser/test_index_col.py b/pandas/tests/io/parser/test_index_col.py
@@ -332,3 +332,23 @@ def test_specify_dtype_for_index_col(all_parsers, dtype, val):
     result = parser.read_csv(StringIO(data), index_col="a", dtype={"a": dtype})
     expected = DataFrame({"b": [2]}, index=Index([val], name="a"))
     tm.assert_frame_equal(result, expected)
+
+
+@skip_pyarrow
+def test_multiindex_columns_not_leading_index_col(all_parsers):
+    # GH#38549
+    parser = all_parsers
+    data = """a,b,c,d
+e,f,g,h
+x,y,1,2
+"""
+    result = parser.read_csv(
+        StringIO(data),
+        header=[0, 1],
+        index_col=1,
+    )
+    cols = MultiIndex.from_tuples(
+        [("a", "e"), ("c", "g"), ("d", "h")], names=["b", "f"]
+    )
+    expected = DataFrame([["x", 1, 2]], columns=cols, index=["y"])
+    tm.assert_frame_equal(result, expected)

Original file line number	Diff line number	Diff line change
`@@ -761,6 +761,7 @@ I/O`
`761`	`761`	- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` with ``compression`` set to ``'zip'`` no longer create a zip file containing a file ending with ".zip". Instead, they try to infer the inner file name more smartly. (:issue:`39465`)
`762`	`762`	- Bug in :func:`read_csv` where reading a mixed column of booleans and missing values to a float type results in the missing values becoming 1.0 rather than NaN (:issue:`42808`, :issue:`34120`)
`763`	`763`	- Bug in :func:`read_csv` when passing simultaneously a parser in ``date_parser`` and ``parse_dates=False``, the parsing was still called (:issue:`44366`)
	`764`	+- Bug in :func:`read_csv` not setting name of :class:`MultiIndex` columns correctly when ``index_col`` is not the first column (:issue:`38549`)
`764`	`765`	- Bug in :func:`read_csv` silently ignoring errors when failling to create a memory-mapped file (:issue:`44766`)
`765`	`766`	- Bug in :func:`read_csv` when passing a ``tempfile.SpooledTemporaryFile`` opened in binary mode (:issue:`44748`)
`766`	`767`	`-`