-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: Tests for replace method when column contains pd.NA (#47480) #49805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tests fail for pd.Series and pd.DataFrame when dtype is not declared.
Looks like the added tests are failing in the CI. Are you interested in contributing a fix? |
pandas/core/missing.py
Outdated
@@ -94,6 +94,11 @@ def mask_missing(arr: ArrayLike, values_to_mask) -> npt.NDArray[np.bool_]: | |||
pass | |||
else: | |||
new_mask = arr == x | |||
# GH#47480 | |||
if isinstance(new_mask, bool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix this without trying the failing conversion first? You can check for dtype object and if arr has any nans and use a different logic in that case.
I would suggest using another mask that applies the equality comparison only for values that are not nan
@@ -89,15 +88,11 @@ def mask_missing(arr: ArrayLike, values_to_mask) -> npt.NDArray[np.bool_]: | |||
# GH 21977 | |||
mask = np.zeros(arr.shape, dtype=bool) | |||
for x in nonna: | |||
if is_numeric_v_string_like(arr, x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls keep this check. it'll avoid some warnings and will be a fastpath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if you pre-compute a mask in case we have object dtype before entering the loop.
Then you can add another if clause to only update the values that are not na
new_mask[~object_na_mask] = arr[~object_na_mask] == x
Last commit seems to pass all tests. |
pandas/core/missing.py
Outdated
@@ -88,12 +88,15 @@ def mask_missing(arr: ArrayLike, values_to_mask) -> npt.NDArray[np.bool_]: | |||
|
|||
# GH 21977 | |||
mask = np.zeros(arr.shape, dtype=bool) | |||
arr_na_mask = ~isna(arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid doing this when not necessary? I think we only need this with object dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vs-araujo can you do this
…nto tests-replace
…nto tests-replace
@phofl Now it only creates the NA mask if Saw a problem after commiting:
Should change to |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Added tests to show BUG from issue (#47480).
Original issue was about pd.Series method (
df["A"].replace
), but pushing tests for both DataFrame and Series.Test failing when DataFrame/Series has dtype
object
(infered whendtype=None
on constructor).