Skip to content

REGR: Fix inplace updates on column to set correct values #35936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 31, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ including other versions of pandas.
Fixed regressions
~~~~~~~~~~~~~~~~~
- Regression in :meth:`DatetimeIndex.intersection` incorrectly raising ``AssertionError`` when intersecting against a list (:issue:`35876`)
- Fix regression in updating a column inplace (e.g. using ``df['col'].fillna(.., inplace=True)``) (:issue:`35731`)
- Performance regression for :meth:`RangeIndex.format` (:issue:`35712`)
-

Expand Down
2 changes: 2 additions & 0 deletions pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1027,6 +1027,8 @@ def iset(self, loc: Union[int, slice, np.ndarray], value):
Set new item in-place. Does not consolidate. Adds new Block if not
contained in the current set of items
"""
if isinstance(value, ABCSeries):
value = value._values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_array to handle Index too?

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel I suppose ideally we ensure that iset always already gets the correct values, so never a Series. And so I could also unpack (or maybe use extract_array?) where iset is called in this case:

pandas/pandas/core/generic.py

Lines 3273 to 3278 in a1f6056

def _maybe_cache_changed(self, item, value) -> None:
"""
The object has called back to us saying maybe it has changed.
"""
loc = self._info_axis.get_loc(item)
self._mgr.iset(loc, value)

But: 1) I am not fully sure in which case this _maybe_cache_changed is used, and so value can also be something else as a Series (so DataFrame)? and 2) iset is being used in other places as well, where the same issue might occur

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re 2: I think iset is only called from 2 places in NDFrame. shouldnt be that hard to check unwrapping there, annotate iset

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, iset itself is indeed only additionally used in _iset_item, but that is then used in _set_item and in indexing code, ... (and checking in the indexing code what values can be passed is already a bit less trivial ;-))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I might do a follow-up to try to push the check earlier

# FIXME: refactor, clearly separate broadcasting & zip-like assignment
# can prob also fix the various if tests for sparse/categorical
if self._blklocs is None and self.ndim > 1:
Expand Down
14 changes: 14 additions & 0 deletions pandas/tests/frame/test_block_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -644,3 +644,17 @@ def test_to_dict_of_blocks_item_cache():
assert df.loc[0, "b"] == "foo"

assert df["b"] is ser


def test_update_inplace_sets_valid_block_values():
# https://github.com/pandas-dev/pandas/issues/33457
df = pd.DataFrame({"a": pd.Series([1, 2, None], dtype="category")})

# inplace update of a single column
df["a"].fillna(1, inplace=True)

# check we havent put a Series into any block.values
assert isinstance(df._mgr.blocks[0].values, pd.Categorical)

# smoketest for OP bug from GH#35731
assert df.isnull().sum().sum() == 0