Skip to content

Backport PR #45287 on branch 1.4.x (BUG: frame[x].loc[y] inconsistent with frame.at[x, y]) (#45287) #47921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 3, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Fixed regressions
~~~~~~~~~~~~~~~~~
- Fixed regression in :func:`concat` materializing :class:`Index` during sorting even if :class:`Index` was already sorted (:issue:`47501`)
- Fixed regression in setting ``None`` or non-string value into a ``string``-dtype Series using a mask (:issue:`47628`)
- Fixed regression in :meth:`loc.__getitem__` with a list of keys causing an internal inconsistency that could lead to a disconnect between ``frame.at[x, y]`` vs ``frame[y].loc[x]`` (:issue:`22372`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably move this to bug fix and put reference to the new issue in the regression section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add a release note for the #47867 issue in the regression section (in the addition to the bugfix release note)

or are you doing this when you add the test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to do this with the test, but no strong preference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. merge when ready.

-

.. ---------------------------------------------------------------------------
Expand Down
7 changes: 4 additions & 3 deletions pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -671,9 +671,6 @@ def reindex_indexer(
result.axes[axis] = new_axis
return result

if consolidate:
self._consolidate_inplace()

# some axes don't allow reindexing with dups
if not allow_dups:
self.axes[axis]._validate_can_reindex(indexer)
Expand Down Expand Up @@ -1681,6 +1678,10 @@ def _consolidate_check(self) -> None:
self._known_consolidated = True

def _consolidate_inplace(self) -> None:
# In general, _consolidate_inplace should only be called via
# DataFrame._consolidate_inplace, otherwise we will fail to invalidate
# the DataFrame's _item_cache. The exception is for newly-created
# BlockManager objects not yet attached to a DataFrame.
if not self.is_consolidated():
self.blocks = tuple(_consolidate(self.blocks))
self._is_consolidated = True
Expand Down
23 changes: 23 additions & 0 deletions pandas/tests/indexing/test_at.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,29 @@ def test_selection_methods_of_assigned_col():


class TestAtSetItem:
def test_at_setitem_item_cache_cleared(self):
# GH#22372 Note the multi-step construction is necessary to trigger
# the original bug. pandas/issues/22372#issuecomment-413345309
df = DataFrame(index=[0])
df["x"] = 1
df["cost"] = 2

# accessing df["cost"] adds "cost" to the _item_cache
df["cost"]

# This loc[[0]] lookup used to call _consolidate_inplace at the
# BlockManager level, which failed to clear the _item_cache
df.loc[[0]]

df.at[0, "x"] = 4
df.at[0, "cost"] = 789

expected = DataFrame({"x": [4], "cost": 789}, index=[0])
tm.assert_frame_equal(df, expected)

# And in particular, check that the _item_cache has updated correctly.
tm.assert_series_equal(df["cost"], expected["cost"])

def test_at_setitem_mixed_index_assignment(self):
# GH#19860
ser = Series([1, 2, 3, 4, 5], index=["a", "b", "c", 1, 2])
Expand Down
5 changes: 4 additions & 1 deletion pandas/tests/internals/test_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -718,7 +718,10 @@ def test_reindex_items(self):
mgr = create_mgr("a: f8; b: i8; c: f8; d: i8; e: f8; f: bool; g: f8-2")

reindexed = mgr.reindex_axis(["g", "c", "a", "d"], axis=0)
assert reindexed.nblocks == 2
# reindex_axis does not consolidate_inplace, as that risks failing to
# invalidate _item_cache
assert not reindexed.is_consolidated()

tm.assert_index_equal(reindexed.items, Index(["g", "c", "a", "d"]))
tm.assert_almost_equal(
mgr.iget(6).internal_values(), reindexed.iget(0).internal_values()
Expand Down