Skip to content

BUG: Replace on Series/DataFrame stops replacing after first NA #57865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 20, 2024

Conversation

asishm
Copy link
Contributor

@asishm asishm commented Mar 16, 2024

The issue was the line a = a[mask] was triggered only for ndarray and doesn't hit when dtype='string', but the np.place logic applied as long as the result was an ndarray.

The old behavior had things like (depending on the number and location of NAs)

In [2]: s = pd.Series(['m', 'm', pd.NA, 'm', 'm', 'm'], dtype='string')

In [3]: s.replace({'m': 't'}, regex=True)
Out[3]:
0       t
1       t
2    <NA>
3       m
4       t
5       t
dtype: string

In [4]: s = pd.Series(['m', 'm', pd.NA, pd.NA, 'm', 'm', 'm'], dtype='string')

In [5]: s.replace({'m': 't'}, regex=True)
Out[5]:
0       t
1       t
2    <NA>
3    <NA>
4       m
5       m
6       t
dtype: string

In [6]: s = pd.Series(['m', 'm', pd.NA, 'm', 'm', pd.NA, 'm', 'm'], dtype='string')

In [7]: s.replace({'m': 't'}, regex=True)
Out[7]:
0       t
1       t
2    <NA>
3       m
4       t
5    <NA>
6       t
7       m
dtype: string

@asishm asishm changed the title Replace mask bug regex BUG: Replace on Series/DataFrame stops replacing after first NA Mar 16, 2024
@asishm
Copy link
Contributor Author

asishm commented Mar 16, 2024

pre-commit.ci autofix

@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate replace replace method labels Mar 20, 2024
@mroeschke mroeschke added this to the 3.0 milestone Mar 20, 2024
@mroeschke mroeschke merged commit 0f7ded2 into pandas-dev:main Mar 20, 2024
51 of 55 checks passed
@mroeschke
Copy link
Member

Thanks @asishm

@asishm asishm deleted the replace_mask_bug_regex branch March 24, 2024 07:39
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
…as-dev#57865)

* update test for GH#56599

* bug: ser/df.replace only replaces first occurence with NAs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add whatsnew

* fmt fix

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate replace replace method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.replace with regex on StringDtype column with NA values stops replacing after first NA
2 participants