Skip to content

Commit 5962f0e

Browse files
phoflm-richards
andauthored
Backport PR #51895 on branch 2.0.x (BUG: Fix getitem dtype preservation with multiindexes) (#53121)
* BUG: Fix getitem dtype preservation with multiindexes (#51895) * BUG/TST fix dtype preservation with multindex * lint * Update pandas/tests/indexing/multiindex/test_multiindex.py Co-authored-by: Joris Van den Bossche <[email protected]> * cleanups * switch to iloc, reindex fails in some cases * suggestions from code review * address code review comments Co-Authored-By: Matthew Roeschke <[email protected]> --------- Co-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Matthew Roeschke <[email protected]> (cherry picked from commit 194b6bb) * Add whatsnew --------- Co-authored-by: Matt Richards <[email protected]>
1 parent 291acfb commit 5962f0e

File tree

3 files changed

+23
-12
lines changed

3 files changed

+23
-12
lines changed

doc/source/whatsnew/v2.0.2.rst

+1
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Bug fixes
3030
- Bug in :meth:`DataFrame.convert_dtypes` ignores ``convert_*`` keywords when set to False ``dtype_backend="pyarrow"`` (:issue:`52872`)
3131
- Bug in :meth:`Series.describe` treating pyarrow-backed timestamps and timedeltas as categorical data (:issue:`53001`)
3232
- Bug in :meth:`pd.array` raising for ``NumPy`` array and ``pa.large_string`` or ``pa.large_binary`` (:issue:`52590`)
33+
- Bug in :meth:`DataFrame.__getitem__` not preserving dtypes for :class:`MultiIndex` partial keys (:issue:`51895`)
3334
-
3435

3536
.. ---------------------------------------------------------------------------

pandas/core/frame.py

+2-12
Original file line numberDiff line numberDiff line change
@@ -3816,18 +3816,8 @@ def _getitem_multilevel(self, key):
38163816
if isinstance(loc, (slice, np.ndarray)):
38173817
new_columns = self.columns[loc]
38183818
result_columns = maybe_droplevels(new_columns, key)
3819-
if self._is_mixed_type:
3820-
result = self.reindex(columns=new_columns)
3821-
result.columns = result_columns
3822-
else:
3823-
new_values = self._values[:, loc]
3824-
result = self._constructor(
3825-
new_values, index=self.index, columns=result_columns, copy=False
3826-
)
3827-
if using_copy_on_write() and isinstance(loc, slice):
3828-
result._mgr.add_references(self._mgr) # type: ignore[arg-type]
3829-
3830-
result = result.__finalize__(self)
3819+
result = self.iloc[:, loc]
3820+
result.columns = result_columns
38313821

38323822
# If there is only one column being returned, and its name is
38333823
# either an empty string, or a tuple with an empty string as its

pandas/tests/indexing/multiindex/test_multiindex.py

+20
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,14 @@
66

77
import pandas as pd
88
from pandas import (
9+
CategoricalDtype,
910
DataFrame,
1011
Index,
1112
MultiIndex,
1213
Series,
1314
)
1415
import pandas._testing as tm
16+
from pandas.core.arrays.boolean import BooleanDtype
1517

1618

1719
class TestMultiIndexBasic:
@@ -206,3 +208,21 @@ def test_multiindex_with_na_missing_key(self):
206208
)
207209
with pytest.raises(KeyError, match="missing_key"):
208210
df[[("missing_key",)]]
211+
212+
def test_multiindex_dtype_preservation(self):
213+
# GH51261
214+
columns = MultiIndex.from_tuples([("A", "B")], names=["lvl1", "lvl2"])
215+
df = DataFrame(["value"], columns=columns).astype("category")
216+
df_no_multiindex = df["A"]
217+
assert isinstance(df_no_multiindex["B"].dtype, CategoricalDtype)
218+
219+
# geopandas 1763 analogue
220+
df = DataFrame(
221+
[[1, 0], [0, 1]],
222+
columns=[
223+
["foo", "foo"],
224+
["location", "location"],
225+
["x", "y"],
226+
],
227+
).assign(bools=Series([True, False], dtype="boolean"))
228+
assert isinstance(df["bools"].dtype, BooleanDtype)

0 commit comments

Comments
 (0)