Skip to content

BUG: ensure reindex / getitem to select columns properly copies data for extension dtypes #51197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1395,6 +1395,7 @@ ExtensionArray
- Bug in :meth:`api.types.is_integer_dtype`, :meth:`api.types.is_unsigned_integer_dtype`, :meth:`api.types.is_signed_integer_dtype`, :meth:`api.types.is_float_dtype` where a custom :class:`ExtensionDtype` would not return ``True`` if ``kind`` returned the corresponding NumPy type (:issue:`50667`)
- Bug in :class:`Series` constructor unnecessarily overflowing for nullable unsigned integer dtypes (:issue:`38798`, :issue:`25880`)
- Bug in setting non-string value into ``StringArray`` raising ``ValueError`` instead of ``TypeError`` (:issue:`49632`)
- Bug in :meth:`DataFrame.reindex` not honoring the default ``copy=True`` keyword in case of columns with ExtensionDtype (and as a result also selecting multiple columns with getitem (``[]``) didn't correctly result in a copy) (:issue:`51197`)

Styler
^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -830,7 +830,7 @@ def _slice_take_blocks_ax0(
# only one item and each mgr loc is a copy of that single
# item.
for mgr_loc in mgr_locs:
newblk = blk.copy(deep=False)
newblk = blk.copy(deep=not only_slice)
newblk.mgr_locs = BlockPlacement(slice(mgr_loc, mgr_loc + 1))
blocks.append(newblk)

Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/frame/methods/test_reindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,21 @@ def test_reindex_copies(self):
result2 = df.reindex(columns=cols, index=df.index, copy=True)
assert not np.shares_memory(result2[0]._values, df[0]._values)

def test_reindex_copies_ea(self):
# https://github.com/pandas-dev/pandas/pull/51197
# also ensure to honor copy keyword for ExtensionDtypes
N = 10
df = DataFrame(np.random.randn(N * 10, N), dtype="Float64")
cols = np.arange(N)
np.random.shuffle(cols)

result = df.reindex(columns=cols, copy=True)
assert not np.shares_memory(result[0].array._data, df[0].array._data)

# pass both columns and index
result2 = df.reindex(columns=cols, index=df.index, copy=True)
assert not np.shares_memory(result2[0].array._data, df[0].array._data)

@td.skip_array_manager_not_yet_implemented
def test_reindex_date_fill_value(self):
# passing date to dt64 is deprecated; enforced in 2.0 to cast to object
Expand Down