ENH: recognize Decimal("NaN") in pd.isna #39409

jbrockmendel · 2021-01-26T04:53:43Z

closes Support Decimal("NaN") is pandas.isna #23530
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

Discussion in #23530 seems ambivalent on whether this is desirable, and I don't have a strong opinion on it in general. BUT tm.assert_foo_equal is incorrect with Decimal("NaN") ATM and id like to see that fixed.

xref #32206

…h-isna-decimal

jbrockmendel · 2021-01-26T04:54:15Z

pandas/tests/io/json/test_pandas.py

        # GH 31615
+        if isinstance(nulls_fixture, Decimal):
+            mark = pytest.mark.xfail(reason="not implemented")


xref #28609 cc @WillAyd

If you wanted to fix this here it would just require adding the same condition in the ujson code that we have for checking floats.

pandas/pandas/_libs/src/ujson/python/objToJSON.c

Line 1552 in 421fb8d

if (npy_isnan(val) || npy_isinf(val)) {

The decimal check is only a few branches below that

jreback

+1 on this, though worried about perf cost, can you run some benchmarks

jreback · 2021-01-26T14:17:24Z

pandas/core/dtypes/missing.py

    if dtype.kind in ["i", "u", "f", "c"]:
        # Numeric
        return obj is not NaT and not isinstance(obj, (np.datetime64, np.timedelta64))

+    if dtype == np.dtype(object):


these be if/elif

jreback · 2021-01-26T14:19:40Z

pandas/core/dtypes/missing.py

@@ -606,15 +607,19 @@ def is_valid_nat_for_dtype(obj, dtype: DtypeObj) -> bool:
    if not lib.is_scalar(obj) or not isna(obj):
        return False
    if dtype.kind == "M":
-        return not isinstance(obj, np.timedelta64)
+        return not isinstance(obj, (np.timedelta64, Decimal))


this is really strange that you need to do this

can you just test dtype.kind == 'O' first?

no bc dtype.kind == "O" includes Period and Interval

jreback · 2021-01-26T14:21:39Z

pandas/tests/indexes/test_index_new.py

@@ -89,6 +91,10 @@ def test_constructor_infer_periodindex(self):
    def test_constructor_infer_nat_dt_like(
        self, pos, klass, dtype, ctor, nulls_fixture, request
    ):
+        if isinstance(nulls_fixture, Decimal):


maybe should have a nulls_fixture_compatible_datetimelike ?

…h-isna-decimal

jbrockmendel · 2021-01-27T00:49:31Z

In [4]: %timeit pd.isna(2)
323 ns ± 15.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  # <-- master
319 ns ± 5.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  # <-- PR

In [5]: %timeit pd.isna(np.nan)
329 ns ± 3.86 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  # <-- master
330 ns ± 9.32 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  # <-- PR

In [6]: arr = np.arange(1000).astype(object)
In [7]: arr[500] = Decimal("NAN")

In [8]: %timeit pd.isna(arr)
42.1 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)  # master
45.3 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)  # <-- PR

jreback · 2021-01-27T14:16:36Z

pandas/core/dtypes/missing.py

    if dtype.kind == "m":
-        return not isinstance(obj, np.datetime64)
+        return not isinstance(obj, (np.datetime64, Decimal))


…h-isna-decimal

jreback · 2021-01-28T03:18:39Z

ok looks fine, can you add a whatsnew note (also prob need to update the docs where we mention missing values e.g. isna & user docs). can do as a followon is ok.

cc @jorisvandenbossche

jorisvandenbossche · 2021-01-29T14:58:38Z

What's the rationale for supporting decimals? (we don't have special support for it elsewhere, I think)

So that something like pd.Series([decimal.Decimal("NaN"), decimal.Decimal("2.2")]).isna() works correctly? (now it returns [False, False])

We don't really have such custom support for anything else in object dtype, so while I certainly understand the use case, I am not sure why would we should do it for decimal and not for something else? (although maybe decimal is the only relevant case)
Also, assume we would have proper decimal support in the future (eg arrow-backed), then the question is also if we want to keep supporting it like this in object dtype (arrow eg also doesn't support "NaN" for decimal).

jbrockmendel · 2021-01-29T17:47:00Z

we don't have special support for it elsewhere, I think

we recognize it in lib.is_scalar and infer_dtype

So that something like pd.Series([decimal.Decimal("NaN"), decimal.Decimal("2.2")]).isna() works correctly? (now it returns [False, False])

correct

jorisvandenbossche · 2021-02-03T15:17:11Z

we recognize it in infer_dtype

Wondering: is the fact that infer_dtype recognizes it actually used somewhere? (I only see it used in the tests)

The fact that is_scalar recognizes it is a good point, but in principle any Python class that follows the number protocol (eg implements __int__ or __float__) will be recognized as scalar.

I am personally still hesitant about the "is this the future behaviour we want?"

jreback · 2021-02-11T00:17:48Z

@jorisvandenbossche are you -1 here? I think this is ok behavior. its rn a special case, so this seems like a nice cleanup.

jorisvandenbossche · 2021-02-12T16:36:20Z

I don't have a strong opinion about it, but we don't support decimals as a first class citizen (only in object dtype as any other Python class), so I don't really see the value in adding a special case for it in our C code.

(but so, if others want to see this behaviour, I won't block it)

jreback · 2021-02-12T17:21:17Z

thanks @jorisvandenbossche yeah to me this improves the UX a bit and doesn't hurt perf so ok with it.

jbrockmendel · 2021-02-19T15:59:39Z

whatsnew added + green

jbrockmendel · 2021-02-23T21:34:32Z

the only reason i can think of to treat Decimal special is bc it is from the stdlib

jreback · 2021-02-25T00:54:49Z

merging, this makes the logic a bit simpler and agree its a built in type, so why not

Matausi29

Done

simonjayhawkins · 2021-02-25T20:03:18Z

pandas/_libs/missing.pyx

+    return (
+        val is C_NA
+        or is_null_datetimelike(val, inat_is_null=False)
+        or is_decimal_na(val)


the list of what is considered null in docstring maybe could be updated.

also for consistency when using pandas.options.mode.use_inf_as_na, what about checknull_old?

the list of what is considered null in docstring maybe could be updated.

just added this to my next "collected misc" branch

also for consistency when using pandas.options.mode.use_inf_as_na, what about checknull_old?

i guess you're referring to Decimal("inf")? my inclination is to let that sleeping dog lie

jbrockmendel added 5 commits January 24, 2021 16:38

CI: fix PandasArray test

1a3b757

Merge branch 'master' of https://github.com/pandas-dev/pandas into en…

cfea335

…h-isna-decimal

Merge branch 'master' of https://github.com/pandas-dev/pandas into en…

f66022c

…h-isna-decimal

Merge branch 'master' into enh-isna-decimal

b18a7d3

ENH: recognize Decimal(nan) in pd.isna

7bd6d83

jbrockmendel commented Jan 26, 2021

View reviewed changes

jreback requested changes Jan 26, 2021

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into en…

4dde6df

…h-isna-decimal

jreback added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Jan 27, 2021

jreback reviewed Jan 27, 2021

View reviewed changes

jbrockmendel added 2 commits January 27, 2021 07:47

Merge branch 'master' of https://github.com/pandas-dev/pandas into en…

a1b34e5

…h-isna-decimal

if->elif

335d599

jbrockmendel added 4 commits February 17, 2021 10:26

Merge branch 'master' into enh-isna-decimal

ee93754

Merge branch 'master' into enh-isna-decimal

15bcccd

Merge branch 'master' into enh-isna-decimal

733e754

update for is_matching_na

d0e2a87

jreback added this to the 1.3 milestone Feb 25, 2021

jreback approved these changes Feb 25, 2021

View reviewed changes

jreback merged commit 8ec9e0a into pandas-dev:master Feb 25, 2021

Matausi29 reviewed Feb 25, 2021

View reviewed changes

jbrockmendel deleted the enh-isna-decimal branch February 25, 2021 02:06

simonjayhawkins reviewed Feb 25, 2021

View reviewed changes

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request Feb 25, 2021

is_decimal_na pandas-dev#39409

d4415b7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: recognize Decimal("NaN") in pd.isna #39409

ENH: recognize Decimal("NaN") in pd.isna #39409

jbrockmendel commented Jan 26, 2021 •

edited

Loading

jbrockmendel Jan 26, 2021

WillAyd Jan 26, 2021

jreback left a comment

jreback Jan 26, 2021

jbrockmendel Jan 26, 2021

jreback Jan 26, 2021

jreback Jan 26, 2021

jbrockmendel Jan 26, 2021

jreback Jan 26, 2021

jbrockmendel commented Jan 27, 2021

jreback Jan 27, 2021

jreback commented Jan 28, 2021

jorisvandenbossche commented Jan 29, 2021

jbrockmendel commented Jan 29, 2021

jorisvandenbossche commented Feb 3, 2021

jreback commented Feb 11, 2021

jorisvandenbossche commented Feb 12, 2021

jreback commented Feb 12, 2021

jbrockmendel commented Feb 19, 2021

jbrockmendel commented Feb 23, 2021

jreback commented Feb 25, 2021

Matausi29 left a comment

simonjayhawkins Feb 25, 2021

jbrockmendel Mar 2, 2021

ENH: recognize Decimal("NaN") in pd.isna #39409

ENH: recognize Decimal("NaN") in pd.isna #39409

Conversation

jbrockmendel commented Jan 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Jan 27, 2021

Choose a reason for hiding this comment

jreback commented Jan 28, 2021

jorisvandenbossche commented Jan 29, 2021

jbrockmendel commented Jan 29, 2021

jorisvandenbossche commented Feb 3, 2021

jreback commented Feb 11, 2021

jorisvandenbossche commented Feb 12, 2021

jreback commented Feb 12, 2021

jbrockmendel commented Feb 19, 2021

jbrockmendel commented Feb 23, 2021

jreback commented Feb 25, 2021

Matausi29 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Jan 26, 2021 •

edited

Loading