-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix dataframe.update not updating dtype #55509 v2 #57637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
93ea131
eb963df
d5a1085
02c1b77
5169022
d7f63a1
1fd2be9
d948e0a
dbdff5c
b2c915f
f9d601a
6de3fce
a62684f
8a28fb0
f08e92e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -8692,6 +8692,10 @@ def update( | |||||||||||
dict.update : Similar method for dictionaries. | ||||||||||||
DataFrame.merge : For column(s)-on-column(s) operations. | ||||||||||||
|
||||||||||||
Notes | ||||||||||||
-------- | ||||||||||||
1. Duplicate indices on `other` are not supported and raises `ValueError`. | ||||||||||||
|
||||||||||||
Examples | ||||||||||||
-------- | ||||||||||||
>>> df = pd.DataFrame({"A": [1, 2, 3], "B": [400, 500, 600]}) | ||||||||||||
|
@@ -8764,11 +8768,22 @@ def update( | |||||||||||
if not isinstance(other, DataFrame): | ||||||||||||
other = DataFrame(other) | ||||||||||||
|
||||||||||||
other = other.reindex(self.index) | ||||||||||||
if other.index.has_duplicates: | ||||||||||||
raise ValueError("Update not allowed with duplicate indexes on other.") | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a note that this is not supported to the docstring. There isn't currently a notes section, something like this: Lines 1450 to 1454 in 737d390
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rhshadrach Done.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Slight preference on keeping this PR just modifying update, but no strong objection to modifying the reindex docs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. It can be done in another PR. |
||||||||||||
|
||||||||||||
rows = other.index.intersection(self.index) | ||||||||||||
if rows.empty: | ||||||||||||
raise ValueError( | ||||||||||||
"Can't update dataframe when other has no index in common with " | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No big deal, but in the previous error message you say "update not allowed" and here "can't update dataframe" to mean the same thing. Not sure how we usually write error messages, but I think it'd be better to be consistent. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree with keeping "update not allowed" because it states that it is possible but not allowed. Done, |
||||||||||||
"this dataframe." | ||||||||||||
) | ||||||||||||
|
||||||||||||
other = other.reindex(rows) | ||||||||||||
this_data = self.loc[rows] | ||||||||||||
|
||||||||||||
for col in self.columns.intersection(other.columns): | ||||||||||||
this = self[col]._values | ||||||||||||
that = other[col]._values | ||||||||||||
this = this_data[col] | ||||||||||||
that = other[col] | ||||||||||||
|
||||||||||||
if filter_func is not None: | ||||||||||||
mask = ~filter_func(this) | isna(that) | ||||||||||||
|
@@ -8788,7 +8803,7 @@ def update( | |||||||||||
if mask.all(): | ||||||||||||
continue | ||||||||||||
|
||||||||||||
self.loc[:, col] = self[col].where(mask, that) | ||||||||||||
self.loc[rows, col] = this.where(mask, that) | ||||||||||||
|
||||||||||||
# ---------------------------------------------------------------------- | ||||||||||||
# Data reshaping | ||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -184,3 +184,47 @@ def test_update_dt_column_with_NaT_create_column(self): | |
{"A": [1.0, 3.0], "B": [pd.NaT, pd.to_datetime("2016-01-01")]} | ||
) | ||
tm.assert_frame_equal(df, expected) | ||
|
||
@pytest.mark.parametrize( | ||
"value_df, value_other, dtype", | ||
[ | ||
(True, False, bool), | ||
(1, 2, int), | ||
(np.uint64(1), np.uint(2), np.dtype("uint64")), | ||
(1.0, 2.0, float), | ||
(1.0 + 1j, 2.0 + 2j, complex), | ||
("a", "b", pd.StringDtype()), | ||
( | ||
pd.to_timedelta("1 ms"), | ||
pd.to_timedelta("2 ms"), | ||
np.dtype("timedelta64[ns]"), | ||
), | ||
( | ||
np.datetime64("2000-01-01T00:00:00"), | ||
np.datetime64("2000-01-02T00:00:00"), | ||
np.dtype("datetime64[ns]"), | ||
), | ||
], | ||
) | ||
def test_update_preserve_dtype(self, value_df, value_other, dtype): | ||
# GH#55509 | ||
df = DataFrame({"a": [value_df] * 2}, index=[1, 2]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't seem to use the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the review and suggestions @datapythonista . I will address all points as soon as possible (hope to finish by tomorrow). Regarding the tests, it's an interesting that it is not there! I will take a more detailed look on them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My question is, why do you have a parameter for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @datapythonista My tests were originally base on the tests proposed by @MichaelTiemannOSC. I tried to turn them simpler and similar to other tests in the same test set, but since they were failing on main and passing this PR I forgot about them. This is addressed in the last two commits, which make two of the tests that I propose strictier. Nothing changes but those tests are better now! Thanks for the guidance on this point. EDIT: I worked on this first because it looked more potentially problematic. As soon as possible the other points will be addressed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Complementing, I think it is better to let the |
||
other = DataFrame({"a": [value_other]}, index=[1]) | ||
expected = DataFrame({"a": [value_other, value_df]}, index=[1, 2]) | ||
df.update(other) | ||
tm.assert_frame_equal(df, expected) | ||
|
||
def test_update_raises_on_duplicate_argument_index(self): | ||
# GH#55509 | ||
df = DataFrame({"a": [1, 1]}, index=[1, 2]) | ||
other = DataFrame({"a": [2, 3]}, index=[1, 1]) | ||
with pytest.raises(ValueError, match="duplicate index"): | ||
df.update(other) | ||
|
||
def test_update_on_duplicate_frame_unique_argument_index(self): | ||
# GH#55509 | ||
df = DataFrame({"a": [1, 1, 1]}, index=[1, 1, 2]) | ||
other = DataFrame({"a": [2, 3]}, index=[1, 2]) | ||
expected = DataFrame({"a": [2, 2, 3]}, index=[1, 1, 2]) | ||
df.update(other) | ||
tm.assert_frame_equal(df, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having the
-------
longer than the title used to make the CI crash. Not sure why the CI is happy now, but maybe worth makingNotes
and it the same length in case it fails again in the future.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it has the same size.