Skip to content

BUG: string methods with NA #31684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 10, 2020
Merged

Conversation

prakhar987
Copy link
Contributor

@prakhar987 prakhar987 commented Feb 5, 2020

@prakhar987
Copy link
Contributor Author

Have added _typ to class NAType so it can be identified in is_nan() correctly. That's one way of doing it but probably can be done better. Will update the PR according to suggestions.

@jbrockmendel
Copy link
Member

I don’t think we want is_nan to recognize NA.

@TomAugspurger
Copy link
Contributor

Agreed. To fix #31632 we'll likely want to adjust how the na_map function is called.

@prakhar987
Copy link
Contributor Author

thanks for the inputs...will update this PR or raise a new one

@prakhar987
Copy link
Contributor Author

@TomAugspurger I think vec_binop is called instead of na_map because input is a vector. Should we implement a variant of vec_binop which can handle NA? Or a shorter fix might be checking for NA before calling bytes.__mul__ and str.__mul__

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Feb 6, 2020 via email

@@ -778,6 +778,8 @@ def scalar_rep(x):
else:

def rep(x, r):
if isinstance(x, libmissing.NAType):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the same change for scalar_rep? (and a test that hits it)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sequence passed to repeat has None it is handled...have added a test for it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of isinstance, just use if x is NA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Have updated.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add pd.NA in tests for the other string methods, this might be a more systematic issue.

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Strings String extension data type and string data labels Feb 9, 2020
@jreback jreback changed the title BUG: is_nan returns True for pd.NA (#31632) BUG: string methods with NA Feb 9, 2020
@prakhar987
Copy link
Contributor Author

prakhar987 commented Feb 10, 2020

@jreback i think other string methods will be fine because most of them use _na_map which handles pd.NA. This specific issue was due to str_repeat using vec_binop for vectored inputs which didn't handle pd.NA. Let me know if you want the tests anyway. Will add tests for NA

@prakhar987
Copy link
Contributor Author

@jreback should i put the pd.NA tests in separate function or just add to functions that are already there?

@TomAugspurger TomAugspurger added this to the 1.0.2 milestone Mar 10, 2020
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add pd.NA in tests for the other string methods, this might be a more systematic issue.

@jreback this was the only case method using vec_binop. The rest are using _na_map which should handle NA correctly. So I think we're OK.

Thanks @prakhar987!

@TomAugspurger TomAugspurger merged commit 37659d4 into pandas-dev:master Mar 10, 2020
@TomAugspurger
Copy link
Contributor

@meeseeksdev backport to 1.0.x

@lumberbot-app
Copy link

lumberbot-app bot commented Mar 10, 2020

Something went wrong ... Please have a look at my logs.

TomAugspurger pushed a commit that referenced this pull request Mar 10, 2020
SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in Series.str.repeat with string dtype and sequence of repeats
4 participants