Skip to content

BUG: DataFrame.update not operating in-place for datetime64[ns, UTC] dtype #56228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 29, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -520,7 +520,7 @@ Indexing

Missing
^^^^^^^
-
- Bug in :meth:`DataFrame.update` wasn't updating in-place for datetime64 dtypes (:issue:`56227`)
-

MultiIndex
Expand Down
18 changes: 8 additions & 10 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -8838,14 +8838,14 @@ def update(
in the original dataframe.

>>> df = pd.DataFrame({'A': [1, 2, 3],
... 'B': [400, 500, 600]})
... 'B': [400., 500., 600.]})
Comment on lines 8840 to +8841
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing these to floats, because otherwise we'd get

<ipython-input-3-3ae759fa2c19>:1: FutureWarning: Downcasting behavior in Series and DataFrame methods 'where', 'mask', and 'clip' is deprecated. In a future version this will not infer object dtypes or cast all-round floats to integers. Instead call result.infer_objects(copy=False) for object inference, or cast round floats explicitly. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`

which looks correct because we're trying to update an int column df['B'] with a float one new_df['B']

>>> new_df = pd.DataFrame({'B': [4, np.nan, 6]})
>>> df.update(new_df)
>>> df
A B
0 1 4
1 2 500
2 3 6
A B
0 1 4.0
1 2 500.0
2 3 6.0
"""
if not PYPY and using_copy_on_write():
if sys.getrefcount(self) <= REF_COUNT:
Expand All @@ -8862,8 +8862,6 @@ def update(
stacklevel=2,
)

from pandas.core.computation import expressions

# TODO: Support other joins
if join != "left": # pragma: no cover
raise NotImplementedError("Only left join is supported")
Expand All @@ -8876,8 +8874,8 @@ def update(
other = other.reindex(self.index)

for col in self.columns.intersection(other.columns):
this = self[col]._values
that = other[col]._values
this = self[col]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is responsible, is this necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice one, thanks! 🙏

that = other[col]

if filter_func is not None:
mask = ~filter_func(this) | isna(that)
Expand All @@ -8897,7 +8895,7 @@ def update(
if mask.all():
continue

self.loc[:, col] = expressions.where(mask, this, that)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I suppose numexpr generally will prevent inplace operations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being inplace here is generally not a good idea since non inplace methods are faster if you replace the whole column

self.loc[:, col] = this.where(mask, that)

# ----------------------------------------------------------------------
# Data reshaping
Expand Down
13 changes: 13 additions & 0 deletions pandas/tests/frame/methods/test_update.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,19 @@ def test_update_datetime_tz(self):
expected = DataFrame([pd.Timestamp("2019", tz="UTC")])
tm.assert_frame_equal(result, expected)

def test_update_datetime_tz_in_place(self, using_copy_on_write):
# https://github.com/pandas-dev/pandas/issues/56227
result = DataFrame([pd.Timestamp("2019", tz="UTC")])
orig = result.copy()
view = result[:]
result.update(result + pd.Timedelta(days=1))
expected = DataFrame([pd.Timestamp("2019-01-02", tz="UTC")])
tm.assert_frame_equal(result, expected)
if not using_copy_on_write:
tm.assert_frame_equal(view, expected)
else:
tm.assert_frame_equal(view, orig)

def test_update_with_different_dtype(self, using_copy_on_write):
# GH#3217
df = DataFrame({"a": [1, 3], "b": [np.nan, 2]})
Expand Down