-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: df.replace with numeric values and str to_replace #36093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: df.replace with numeric values and str to_replace #36093
Conversation
if isinstance(mask, ExtensionArray): | ||
if isinstance(mask, BooleanArray): | ||
mask = mask.to_numpy(dtype=bool, na_value=False) | ||
elif isinstance(mask, ExtensionArray): | ||
# We could have BooleanArray, Sparse[bool], ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think need to update this comment now though - so is there no way to keep this in the same branch as the ExtensionArray check? Would be nice to stay as generic as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if we can use to_numpy in the general case
return np.zeros(a.shape, dtype=bool) | ||
|
||
elif is_datetimelike_v_numeric(a, b) or is_numeric_v_string_like(a, b): | ||
# GH#29553 avoid deprecation warnings from numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though there is also a mistake here (the second condition has been refactored to a few lines up, so this line should just be elif is_datetimelike_v_numeric(a, b):
In master this is where we incorrectly raise instead of just consider string==numeric not-equal
Looks like we have both a doctest and prose in missing_data.rst saying the current behavior (which this PR calls a bug) is intentional:
Under this PR, the example in the doctest returns
which strike me as better behavior. |
I don't think the documented behavior is desirable, and I read it as more of a "hey this is a limitation of the replace implementation". So I'm OK with changing behavior here as a "bugfix with behavior changing implications". |
very nice @jbrockmendel +1 on adding array_algos and using them from the blocks |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
We also avoid copies by not calling
self.as_array
and instead moving the mask-finding to the block level.