-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: recognize Decimal("NaN") in pd.isna #39409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# GH 31615 | ||
if isinstance(nulls_fixture, Decimal): | ||
mark = pytest.mark.xfail(reason="not implemented") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted to fix this here it would just require adding the same condition in the ujson code that we have for checking floats.
pandas/pandas/_libs/src/ujson/python/objToJSON.c
Line 1552 in 421fb8d
if (npy_isnan(val) || npy_isinf(val)) { |
The decimal check is only a few branches below that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this, though worried about perf cost, can you run some benchmarks
pandas/core/dtypes/missing.py
Outdated
if dtype.kind in ["i", "u", "f", "c"]: | ||
# Numeric | ||
return obj is not NaT and not isinstance(obj, (np.datetime64, np.timedelta64)) | ||
|
||
if dtype == np.dtype(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these be if/elif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
pandas/core/dtypes/missing.py
Outdated
@@ -606,15 +607,19 @@ def is_valid_nat_for_dtype(obj, dtype: DtypeObj) -> bool: | |||
if not lib.is_scalar(obj) or not isna(obj): | |||
return False | |||
if dtype.kind == "M": | |||
return not isinstance(obj, np.timedelta64) | |||
return not isinstance(obj, (np.timedelta64, Decimal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really strange that you need to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just test dtype.kind == 'O' first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no bc dtype.kind == "O" includes Period and Interval
@@ -89,6 +91,10 @@ def test_constructor_infer_periodindex(self): | |||
def test_constructor_infer_nat_dt_like( | |||
self, pos, klass, dtype, ctor, nulls_fixture, request | |||
): | |||
if isinstance(nulls_fixture, Decimal): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe should have a nulls_fixture_compatible_datetimelike ?
|
if dtype.kind == "m": | ||
return not isinstance(obj, np.datetime64) | ||
return not isinstance(obj, (np.datetime64, Decimal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
ok looks fine, can you add a whatsnew note (also prob need to update the docs where we mention missing values e.g. isna & user docs). can do as a followon is ok. |
What's the rationale for supporting decimals? (we don't have special support for it elsewhere, I think) So that something like We don't really have such custom support for anything else in |
we recognize it in lib.is_scalar and infer_dtype
correct |
Wondering: is the fact that The fact that I am personally still hesitant about the "is this the future behaviour we want?" |
@jorisvandenbossche are you -1 here? I think this is ok behavior. its rn a special case, so this seems like a nice cleanup. |
I don't have a strong opinion about it, but we don't support decimals as a first class citizen (only in object dtype as any other Python class), so I don't really see the value in adding a special case for it in our C code. (but so, if others want to see this behaviour, I won't block it) |
thanks @jorisvandenbossche yeah to me this improves the UX a bit and doesn't hurt perf so ok with it. |
whatsnew added + green |
the only reason i can think of to treat Decimal special is bc it is from the stdlib |
merging, this makes the logic a bit simpler and agree its a built in type, so why not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return ( | ||
val is C_NA | ||
or is_null_datetimelike(val, inat_is_null=False) | ||
or is_decimal_na(val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the list of what is considered null in docstring maybe could be updated.
also for consistency when using pandas.options.mode.use_inf_as_na, what about checknull_old?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the list of what is considered null in docstring maybe could be updated.
just added this to my next "collected misc" branch
also for consistency when using pandas.options.mode.use_inf_as_na, what about checknull_old?
i guess you're referring to Decimal("inf")
? my inclination is to let that sleeping dog lie
Discussion in #23530 seems ambivalent on whether this is desirable, and I don't have a strong opinion on it in general. BUT tm.assert_foo_equal is incorrect with Decimal("NaN") ATM and id like to see that fixed.
xref #32206