-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix StringArray use_inf_as_na bug #33656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
you also have #33629 for this same issue? |
We were thinking we may want to treat them separately since they're slightly different issues: #33629 (comment). Should both bugs be in the same PR? |
Can you also add a test somewhere for an object dtype Series that has pd.NA in it? (that won't be covered by the string dtype tests) This shouldn't be "draft" anymore? |
I was hoping to finish up the PR linked above first because it overlaps somewhat with this one |
Does this address #33209? |
Yes I think it should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a base extension test for this? (there should already be isna()
tests, so you can add a similar one there but which is using use_inf_as_na
Added some base extension use_inf_as_na tests, however I'm getting failures from test_sparse.py (dtype-related, I believe), test_json.py and test_decimal.py (the latter two I think because of improper handling of null values; e.g., {} and Decimal("NaN") under the use_inf_as_na context). Should I xfail for these for now and try to address in a follow-up? |
@@ -210,7 +210,7 @@ def _isna_ndarraylike(obj, inf_as_na: bool = False): | |||
|
|||
if is_extension_array_dtype(dtype): | |||
if inf_as_na: | |||
result = values.isna() | (values == -np.inf) | (values == np.inf) | |||
result = libmissing.isnaobj_old(values.to_numpy()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we short-circuit this (and avoid the object-cast of to_numpy) for e.g. PeriodArray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in general, but we could short-circuit the known built-in dtypes that will never be able to contain inf (basically all our internal EA dtypes except for categorical ?)
@dsaxton can you merge master once more? And what is left for this PR? I think only the comment of @jbrockmendel to not do the expensive |
@jorisvandenbossche There was also this issue that I still need to figure out (I think it could potentially be fixed with the short circuiting mentioned above): #33656 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. ping on green. I don't think its worth the trouble to deprecate this, possibly could do some more code cleaning, but so far so good (IIRC you de-duplicated things a bit already);
@jreback ping |
thanks @dsaxton |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff