Skip to content

BUG: Fix replacing in string series with NA (pandas-dev#32621) #32890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Apr 10, 2020
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
47f6676
BUG: Fix replacing in `string` series with NA (pandas-dev#32621)
chrispe Mar 21, 2020
2b53200
BUG: Fix replacing in `string` series with NA (pandas-dev#32621)
chrispe Mar 21, 2020
7678495
BUG: Fix replacing in `string` series with NA (pandas-dev#32621)
chrispe Mar 21, 2020
719369d
BUG: Fix replacing in `string` series with NA (pandas-dev#32621)
chrispe Mar 21, 2020
e98c7c9
BUG: Fix replacing in `string` series with NA (pandas-dev#32621)
chrispe Mar 22, 2020
fb8d143
BUG: Fix replacing in `string` series with NA (pandas-dev#32621)
chrispe Mar 22, 2020
0405f6a
Merge remote-tracking branch 'upstream/master' into fix-issue-32621
chrispe Mar 22, 2020
23da71c
Merge remote-tracking branch 'upstream/master' into fix-issue-32621
chrispe Mar 29, 2020
ca81cb0
BUG: Fix replacing in `string` series with NA (pandas-dev#32621)
chrispe Mar 29, 2020
8b6d224
Merge remote-tracking branch 'upstream/master' into fix-issue-32621
chrispe Apr 8, 2020
c32a2cc
BUG: Fix replacing in `string` series with NA (#32621)
chrispe Apr 8, 2020
0a76844
BUG: Fix replacing in `string` series with NA (#32621)
chrispe Apr 8, 2020
b62ad89
BUG: Fix replacing in `string` series with NA (pandas-dev#32621)
chrispe Apr 8, 2020
f5b994a
Merge remote-tracking branch 'upstream/master' into fix-issue-32621
chrispe Apr 8, 2020
a73e2eb
BUG: Fix replacing in `string` series with NA (pandas-dev#32621)
chrispe Apr 8, 2020
949accc
BUG: Fix replacing in `string` series with NA (pandas-dev#32621)
chrispe Apr 8, 2020
e7b37a0
Merge remote-tracking branch 'upstream/master' into fix-issue-32621
chrispe Apr 8, 2020
37da655
Merge remote-tracking branch 'upstream/master' into fix-issue-32621
chrispe Apr 9, 2020
df5bc39
Added description to the 1.1 bug fixes section
chrispe Apr 9, 2020
606aeb8
Merge remote-tracking branch 'upstream/master' into fix-issue-32621
chrispe Apr 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
from pandas.core.dtypes.concat import concat_compat
from pandas.core.dtypes.dtypes import ExtensionDtype
from pandas.core.dtypes.generic import ABCDataFrame, ABCSeries
from pandas.core.dtypes.missing import isna
from pandas.core.dtypes.missing import isna, na_value_for_dtype

import pandas.core.algorithms as algos
from pandas.core.arrays.sparse import SparseDtype
Expand Down Expand Up @@ -1945,11 +1945,22 @@ def _compare_or_regex_search(a, b, regex=False):
is_a_array = isinstance(a, np.ndarray)
is_b_array = isinstance(b, np.ndarray)

# GH#32621 replace all pd.NAs to avoid failure of element-wise comparison
mask = isna(a) | isna(b)
if is_a_array:
a = np.where(mask, na_value_for_dtype(a.dtype, compat=False), a)
if is_b_array:
b = np.where(mask, na_value_for_dtype(b.dtype, compat=False), b)

if is_datetimelike_v_numeric(a, b) or is_numeric_v_string_like(a, b):
# GH#29553 avoid deprecation warnings from numpy
result = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just return here (you will need move the check on like 194 into a function ,e.g.

if is_datetimeliek......:
     return check(False)

.....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update for this

Copy link
Contributor Author

@chrispe chrispe Apr 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jreback, I'm not sure if I completely understood the change you requested to make here. But I assume you wanted to place the entire block that checks the result's value into a function instead? If so, I did that and made some additional changes (c32a2cc) in order to have it working. Can you please confirm? Thanks

else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't need to have an else here any longer

result = op(a)
if isinstance(result, np.ndarray):
result[mask] = na_value_for_dtype(result.dtype, compat=False)
elif isna(result):
result = na_value_for_dtype(np.bool, compat=False)

if is_scalar(result) and (is_a_array or is_b_array):
type_names = [type(a).__name__, type(b).__name__]
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/series/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,13 @@ def test_replace2(self):
assert (ser[6:10] == -1).all()
assert (ser[20:30] == -1).all()

def test_replace_with_dictlike_and_string_dtype(self):
# GH 32621
s = pd.Series(["one", "two", np.nan], dtype="string")
expected = pd.Series(["1", "2", np.nan])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected result here should also be dtype="string" I think

Copy link
Contributor Author

@chrispe chrispe Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently that's not the case (even at the current stable version):

>>> import pandas as pd
>>> pd.__version__
'1.0.3'
>>> s = pd.Series(["one", "two"], dtype="string")
>>> expected = pd.Series(["1", "2"], dtype="string")
>>> result = s.replace(to_replace={"one": "1", "two": "2"})
>>> expected
0    1
1    2
dtype: string
>>> result
0    1
1    2
dtype: object

I'm not sure if that behaviour is to be expected or it should be tackled within a new issue. What do you think? This is not related to containing NA values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that it is certainly expected for replace to again return a string dtype.

Although of course, there are not guarantees that your replacement value is a string ...
And given that it is also not working on master, not sure it needs to be fixed here.

For some cases of replace it does work fine:

In [33]: s = pd.Series(["one", "two"], dtype="string")  

In [34]: s.replace("one", "1")   
Out[34]: 
0      1
1    two
dtype: string

Copy link
Contributor Author

@chrispe chrispe Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation. When not using the argument to_replace (i.e. s.replace("one", "1")) then the trace is different. It doesn't use the same function to apply the replacement in comparison to s.replace(to_replace={"one": "1"}).

For more details check here: https://github.com/pandas-dev/pandas/blob/master/pandas/core/generic.py#L6462

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrispe92 can you open an issue for this

result = s.replace({"one": "1", "two": "2"})
tm.assert_series_equal(expected, result)

def test_replace_with_empty_dictlike(self):
# GH 15289
s = pd.Series(list("abcd"))
Expand Down