Skip to content

REGR: properly update DataFrame cache in Series.__setitem__ #48215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Fixed regressions
- Fixed regression in :meth:`DataFrame.loc` setting a length-1 array like value to a single value in the DataFrame (:issue:`46268`)
- Fixed regression when slicing with :meth:`DataFrame.loc` with :class:`DateOffset`-index (:issue:`46671`)
- Fixed regression in setting ``None`` or non-string value into a ``string``-dtype Series using a mask (:issue:`47628`)
- Fixed regression in updating a DataFrame column through Series ``__setitem__`` (using chained assignment) not updating column values inplace and using too much memory (:issue:`47172`)
- Fixed regression in :meth:`DataFrame.select_dtypes` returning a view on the original DataFrame (:issue:`48090`)
- Fixed regression using custom Index subclasses (for example, used in xarray) with :meth:`~DataFrame.reset_index` or :meth:`Index.insert` (:issue:`47071`)
- Fixed regression in :meth:`DatetimeIndex.intersection` when the :class:`DatetimeIndex` has dates crossing daylight savings time (:issue:`46702`)
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -1169,7 +1169,7 @@ def __setitem__(self, key, value) -> None:
self._set_with(key, value)

if cacher_needs_updating:
self._maybe_update_cacher()
self._maybe_update_cacher(inplace=True)

def _set_with_engine(self, key, value) -> None:
loc = self.index.get_loc(key)
Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/frame/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -1235,3 +1235,21 @@ def test_setitem_not_operating_inplace(self, value, set_value, indexer):
view = df[:]
df[indexer] = set_value
tm.assert_frame_equal(view, expected)

@td.skip_array_manager_invalid_test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this skips whereas skip_array_manager_not_yet_implemented xfails.

Now this test fails with AttributeError: 'ArrayManager' object has no attribute 'blocks' since we are explicitly testing the BlockManager values.

so it does make sense to just skip.

but are we likely to want a similar test for array manger?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it indeed makes sense to skip this test instead of xfail, since the way is written is not intended to ever work for ArrayManager. But so we could indeed write a separate test that checks for the the ArrayManager case (but I would say it's maybe not a super high priority, I can only take a look in the weekend).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I would say it's maybe not a super high priority

no problem. not a blocker.

def test_setitem_column_update_inplace(self, using_copy_on_write):
# https://github.com/pandas-dev/pandas/issues/47172

labels = [f"c{i}" for i in range(10)]
df = DataFrame({col: np.zeros(len(labels)) for col in labels}, index=labels)
values = df._mgr.blocks[0].values

for label in df.columns:
df[label][label] = 1

if not using_copy_on_write:
# diagonal values all updated
assert np.all(values[np.arange(10), np.arange(10)] == 1)
else:
# original dataframe not updated
assert np.all(values[np.arange(10), np.arange(10)] == 0)