Skip to content

BUG: Fix StringArray use_inf_as_na bug #33656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
May 10, 2020
Merged

BUG: Fix StringArray use_inf_as_na bug #33656

merged 19 commits into from
May 10, 2020

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Apr 19, 2020

@dsaxton dsaxton added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Apr 19, 2020
@dsaxton dsaxton marked this pull request as draft April 19, 2020 16:13
@dsaxton dsaxton changed the title BUG: Fix use_inf_as_na bug for StringArray BUG: Fix StringArray use_inf_as_na bug Apr 19, 2020
@jreback
Copy link
Contributor

jreback commented Apr 19, 2020

you also have #33629 for this same issue?

@dsaxton
Copy link
Member Author

dsaxton commented Apr 19, 2020

you also have #33629 for this same issue?

We were thinking we may want to treat them separately since they're slightly different issues: #33629 (comment). Should both bugs be in the same PR?

@jorisvandenbossche
Copy link
Member

Can you also add a test somewhere for an object dtype Series that has pd.NA in it? (that won't be covered by the string dtype tests)

This shouldn't be "draft" anymore?

@dsaxton
Copy link
Member Author

dsaxton commented Apr 20, 2020

This shouldn't be "draft" anymore?

I was hoping to finish up the PR linked above first because it overlaps somewhat with this one

@jbrockmendel
Copy link
Member

Does this address #33209?

@dsaxton
Copy link
Member Author

dsaxton commented Apr 20, 2020

Does this address #33209?

Yes I think it should

@dsaxton dsaxton marked this pull request as ready for review April 28, 2020 02:38
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a base extension test for this? (there should already be isna() tests, so you can add a similar one there but which is using use_inf_as_na

@jreback jreback requested a review from jbrockmendel April 29, 2020 22:40
@jreback jreback added this to the 1.1 milestone Apr 29, 2020
@dsaxton
Copy link
Member Author

dsaxton commented Apr 30, 2020

Can you also add a base extension test for this? (there should already be isna() tests, so you can add a similar one there but which is using use_inf_as_na

Added some base extension use_inf_as_na tests, however I'm getting failures from test_sparse.py (dtype-related, I believe), test_json.py and test_decimal.py (the latter two I think because of improper handling of null values; e.g., {} and Decimal("NaN") under the use_inf_as_na context). Should I xfail for these for now and try to address in a follow-up?

@@ -210,7 +210,7 @@ def _isna_ndarraylike(obj, inf_as_na: bool = False):

if is_extension_array_dtype(dtype):
if inf_as_na:
result = values.isna() | (values == -np.inf) | (values == np.inf)
result = libmissing.isnaobj_old(values.to_numpy())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we short-circuit this (and avoid the object-cast of to_numpy) for e.g. PeriodArray?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in general, but we could short-circuit the known built-in dtypes that will never be able to contain inf (basically all our internal EA dtypes except for categorical ?)

@jorisvandenbossche
Copy link
Member

@dsaxton can you merge master once more?

And what is left for this PR? I think only the comment of @jbrockmendel to not do the expensive libmissing.isnaobj_old(values.to_numpy()) check for certain dtypes where this will never be needed (like Periods) ?

@dsaxton
Copy link
Member Author

dsaxton commented May 9, 2020

And what is left for this PR?

@jorisvandenbossche There was also this issue that I still need to figure out (I think it could potentially be fixed with the short circuiting mentioned above): #33656 (comment)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. ping on green. I don't think its worth the trouble to deprecate this, possibly could do some more code cleaning, but so far so good (IIRC you de-duplicated things a bit already);

@dsaxton
Copy link
Member Author

dsaxton commented May 10, 2020

@jreback ping

@jreback jreback merged commit 678a9ac into pandas-dev:master May 10, 2020
@jreback
Copy link
Contributor

jreback commented May 10, 2020

thanks @dsaxton

@dsaxton dsaxton deleted the string-use-inf-as-na branch May 10, 2020 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pandas.options.mode.use_inf_as_na disables NA checking for StringArray
4 participants