Skip to content

BUG: df.replace with numeric values and str to_replace #36093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 5, 2020

Conversation

jbrockmendel
Copy link
Member

We also avoid copies by not calling self.as_array and instead moving the mask-finding to the block level.

if isinstance(mask, ExtensionArray):
if isinstance(mask, BooleanArray):
mask = mask.to_numpy(dtype=bool, na_value=False)
elif isinstance(mask, ExtensionArray):
# We could have BooleanArray, Sparse[bool], ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need to update this comment now though - so is there no way to keep this in the same branch as the ExtensionArray check? Would be nice to stay as generic as possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if we can use to_numpy in the general case

return np.zeros(a.shape, dtype=bool)

elif is_datetimelike_v_numeric(a, b) or is_numeric_v_string_like(a, b):
# GH#29553 avoid deprecation warnings from numpy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though there is also a mistake here (the second condition has been refactored to a few lines up, so this line should just be elif is_datetimelike_v_numeric(a, b):

In master this is where we incorrectly raise instead of just consider string==numeric not-equal

@WillAyd WillAyd added the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Sep 3, 2020
@jbrockmendel
Copy link
Member Author

Looks like we have both a doctest and prose in missing_data.rst saying the current behavior (which this PR calls a bug) is intentional:

        Note that when replacing multiple ``bool`` or ``datetime64`` objects,
        the data types in the `to_replace` parameter must match the data
        type of the value being replaced:

        >>> df = pd.DataFrame({'A': [True, False, True], 'B': [False, True, False]})
        >>> df.replace({'a string': 'new value', True: False})  # raises
        Traceback (most recent call last):
            ...
        TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str'

        This raises a ``TypeError`` because one of the ``dict`` keys is not of
        the correct type for replacement.

Under this PR, the example in the doctest returns

       A      B
0  False  False
1  False  False
2  False  False

which strike me as better behavior.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 4, 2020

I don't think the documented behavior is desirable, and I read it as more of a "hey this is a limitation of the replace implementation".

So I'm OK with changing behavior here as a "bugfix with behavior changing implications".

@jreback jreback added this to the 1.2 milestone Sep 5, 2020
@jreback jreback merged commit 3967131 into pandas-dev:master Sep 5, 2020
@jreback
Copy link
Contributor

jreback commented Sep 5, 2020

very nice @jbrockmendel +1 on adding array_algos and using them from the blocks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Replace raises TypeError if to_replace is Dict with numeric DataFrame and key of Dict is String
4 participants