diff --git a/doc/source/user_guide/missing_data.rst b/doc/source/user_guide/missing_data.rst index 6c36a6470f841..11957cfa265f5 100644 --- a/doc/source/user_guide/missing_data.rst +++ b/doc/source/user_guide/missing_data.rst @@ -12,10 +12,10 @@ pandas. .. note:: The choice of using ``NaN`` internally to denote missing data was largely - for simplicity and performance reasons. It differs from the MaskedArray - approach of, for example, :mod:`scikits.timeseries`. We are hopeful that - NumPy will soon be able to provide a native NA type solution (similar to R) - performant enough to be used in pandas. + for simplicity and performance reasons. + Starting from pandas 1.0, some optional data types start experimenting + with a native ``NA`` scalar using a mask-based approach. See + :ref:`here ` for more. See the :ref:`cookbook` for some advanced strategies. @@ -110,7 +110,7 @@ pandas objects provide compatibility between ``NaT`` and ``NaN``. .. _missing.inserting: Inserting missing data ----------------------- +~~~~~~~~~~~~~~~~~~~~~~ You can insert missing values by simply assigning to containers. The actual missing value used will be chosen based on the dtype. @@ -135,9 +135,10 @@ For object containers, pandas will use the value given: s.loc[1] = np.nan s +.. _missing_data.calculations: Calculations with missing data ------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Missing values propagate naturally through arithmetic operations between pandas objects. @@ -771,3 +772,139 @@ the ``dtype="Int64"``. s See :ref:`integer_na` for more. + + +.. _missing_data.NA: + +Experimental ``NA`` scalar to denote missing values +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. warning:: + + Experimental: the behaviour of ``pd.NA`` can still change without warning. + +.. versionadded:: 1.0.0 + +Starting from pandas 1.0, an experimental ``pd.NA`` value (singleton) is +available to represent scalar missing values. At this moment, it is used in +the nullable :doc:`integer `, boolean and +:ref:`dedicated string ` data types as the missing value indicator. + +The goal of ``pd.NA`` is provide a "missing" indicator that can be used +consistently accross data types (instead of ``np.nan``, ``None`` or ``pd.NaT`` +depending on the data type). + +For example, when having missing values in a Series with the nullable integer +dtype, it will use ``pd.NA``: + +.. ipython:: python + + s = pd.Series([1, 2, None], dtype="Int64") + s + s[2] + s[2] is pd.NA + +Currently, pandas does not yet use those data types by default (when creating +a DataFrame or Series, or when reading in data), so you need to specify +the dtype explicitly. + +Propagation in arithmetic and comparison operations +--------------------------------------------------- + +In general, missing values *propagate* in operations involving ``pd.NA``. When +one of the operands is unknown, the outcome of the operation is also unknown. + +For example, ``pd.NA`` propagates in arithmetic operations, similarly to +``np.nan``: + +.. ipython:: python + + pd.NA + 1 + "a" * pd.NA + +In equality and comparison operations, ``pd.NA`` also propagates. This deviates +from the behaviour of ``np.nan``, where comparisons with ``np.nan`` always +return ``False``. + +.. ipython:: python + + pd.NA == 1 + pd.NA == pd.NA + pd.NA < 2.5 + +To check if a value is equal to ``pd.NA``, the :func:`isna` function can be +used: + +.. ipython:: python + + pd.isna(pd.NA) + +An exception on this basic propagation rule are *reductions* (such as the +mean or the minimum), where pandas defaults to skipping missing values. See +:ref:`above ` for more. + +Logical operations +------------------ + +For logical operations, ``pd.NA`` follows the rules of the +`three-valued logic `__ (or +*Kleene logic*, similarly to R, SQL and Julia). This logic means to only +propagate missing values when it is logically required. + +For example, for the logical "or" operation (``|``), if one of the operands +is ``True``, we already know the result will be ``True``, regardless of the +other value (so regardless the missing value would be ``True`` or ``False``). +In this case, ``pd.NA`` does not propagate: + +.. ipython:: python + + True | False + True | pd.NA + pd.NA | True + +On the other hand, if one of the operands is ``False``, the result depends +on the value of the other operand. Therefore, in this case ``pd.NA`` +propagates: + +.. ipython:: python + + False | True + False | False + False | pd.NA + +The behaviour of the logical "and" operation (``&``) can be derived using +similar logic (where now ``pd.NA`` will not propagate if one of the operands +is already ``False``): + +.. ipython:: python + + False & True + False & False + False & pd.NA + +.. ipython:: python + + True & True + True & False + True & pd.NA + + +``NA`` in a boolean context +--------------------------- + +Since the actual value of an NA is unknown, it is ambiguous to convert NA +to a boolean value. The following raises an error: + +.. ipython:: python + :okexcept: + + bool(pd.NA) + +This also means that ``pd.NA`` cannot be used in a context where it is +evaluated to a boolean, such as ``if condition: ...`` where ``condition`` can +potentially be ``pd.NA``. In such cases, :func:`isna` can be used to check +for ``pd.NA`` or ``condition`` being ``pd.NA`` can be avoided, for example by +filling missing values beforehand. + +A similar situation occurs when using Series or DataFrame objects in ``if`` +statements, see :ref:`gotchas.truth`. diff --git a/doc/source/whatsnew/v1.0.0.rst b/doc/source/whatsnew/v1.0.0.rst index db23bfdc8a5bd..704f240ed32a1 100644 --- a/doc/source/whatsnew/v1.0.0.rst +++ b/doc/source/whatsnew/v1.0.0.rst @@ -102,6 +102,50 @@ String accessor methods returning integers will return a value with :class:`Int6 We recommend explicitly using the ``string`` data type when working with strings. See :ref:`text.types` for more. +.. _whatsnew_100.NA: + +Experimental ``NA`` scalar to denote missing values +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A new ``pd.NA`` value (singleton) is introduced to represent scalar missing +values. Up to now, ``np.nan`` is used for this for float data, ``np.nan`` or +``None`` for object-dtype data and ``pd.NaT`` for datetime-like data. The +goal of ``pd.NA`` is provide a "missing" indicator that can be used +consistently accross data types. For now, the nullable integer and boolean +data types and the new string data type make use of ``pd.NA`` (:issue:`28095`). + +.. warning:: + + Experimental: the behaviour of ``pd.NA`` can still change without warning. + +For example, creating a Series using the nullable integer dtype: + +.. ipython:: python + + s = pd.Series([1, 2, None], dtype="Int64") + s + s[2] + +Compared to ``np.nan``, ``pd.NA`` behaves differently in certain operations. +In addition to arithmetic operations, ``pd.NA`` also propagates as "missing" +or "unknown" in comparison operations: + +.. ipython:: python + + np.nan > 1 + pd.NA > 1 + +For logical operations, ``pd.NA`` follows the rules of the +`three-valued logic `__ (or +*Kleene logic*). For example: + +.. ipython:: python + + pd.NA | True + +For more, see :ref:`NA section ` in the user guide on missing +data. + .. _whatsnew_100.boolean: Boolean data type with missing values support diff --git a/pandas/__init__.py b/pandas/__init__.py index d6f3458b4d604..a60aa08b89f84 100644 --- a/pandas/__init__.py +++ b/pandas/__init__.py @@ -70,6 +70,7 @@ StringDtype, BooleanDtype, # missing + NA, isna, isnull, notna, diff --git a/pandas/_libs/lib.pyx b/pandas/_libs/lib.pyx index aaf6456df8f8e..b52a5c1cb6d3b 100644 --- a/pandas/_libs/lib.pyx +++ b/pandas/_libs/lib.pyx @@ -58,7 +58,7 @@ from pandas._libs.tslibs.timedeltas cimport convert_to_timedelta64 from pandas._libs.tslibs.timezones cimport get_timezone, tz_compare from pandas._libs.missing cimport ( - checknull, isnaobj, is_null_datetime64, is_null_timedelta64, is_null_period + checknull, isnaobj, is_null_datetime64, is_null_timedelta64, is_null_period, C_NA ) @@ -161,6 +161,7 @@ def is_scalar(val: object) -> bool: or PyTime_Check(val) # We differ from numpy, which claims that None is not scalar; # see np.isscalar + or val is C_NA or val is None or isinstance(val, (Fraction, Number)) or util.is_period_object(val) @@ -1502,7 +1503,7 @@ cdef class Validator: f'must define is_value_typed') cdef bint is_valid_null(self, object value) except -1: - return value is None or util.is_nan(value) + return value is None or value is C_NA or util.is_nan(value) cdef bint is_array_typed(self) except -1: return False diff --git a/pandas/_libs/missing.pxd b/pandas/_libs/missing.pxd index d0dd306680ae8..d4303ac28b9a5 100644 --- a/pandas/_libs/missing.pxd +++ b/pandas/_libs/missing.pxd @@ -9,3 +9,8 @@ cpdef ndarray[uint8_t] isnaobj(ndarray arr) cdef bint is_null_datetime64(v) cdef bint is_null_timedelta64(v) cdef bint is_null_period(v) + +cdef class C_NAType: + pass + +cdef C_NAType C_NA diff --git a/pandas/_libs/missing.pyx b/pandas/_libs/missing.pyx index 9568ddb7fe53f..9bf955ad369e7 100644 --- a/pandas/_libs/missing.pyx +++ b/pandas/_libs/missing.pyx @@ -1,6 +1,8 @@ import cython from cython import Py_ssize_t +import numbers + import numpy as np cimport numpy as cnp from numpy cimport ndarray, int64_t, uint8_t, float64_t @@ -44,7 +46,7 @@ cpdef bint checknull(object val): The difference between `checknull` and `checknull_old` is that `checknull` does *not* consider INF or NEGINF to be NA. """ - return is_null_datetimelike(val, inat_is_null=False) + return val is C_NA or is_null_datetimelike(val, inat_is_null=False) cpdef bint checknull_old(object val): @@ -278,3 +280,137 @@ cdef inline bint is_null_period(v): # determine if we have a null for a Period (or integer versions), # excluding np.datetime64('nat') and np.timedelta64('nat') return checknull_with_nat(v) + + +# ----------------------------------------------------------------------------- +# Implementation of NA singleton + + +def _create_binary_propagating_op(name, divmod=False): + + def method(self, other): + if other is C_NA or isinstance(other, str) or isinstance(other, numbers.Number): + if divmod: + return NA, NA + else: + return NA + + return NotImplemented + + method.__name__ = name + return method + + +def _create_unary_propagating_op(name): + def method(self): + return NA + + method.__name__ = name + return method + + +cdef class C_NAType: + pass + + +class NAType(C_NAType): + """ + NA ("not available") missing value indicator. + + .. warning:: + + Experimental: the behaviour of NA can still change without warning. + + .. versionadded:: 1.0.0 + + The NA singleton is a missing value indicator defined by pandas. It is + used in certain new extension dtypes (currently the "string" dtype). + """ + + _instance = None + + def __new__(cls, *args, **kwargs): + if NAType._instance is None: + NAType._instance = C_NAType.__new__(cls, *args, **kwargs) + return NAType._instance + + def __repr__(self) -> str: + return "NA" + + def __str__(self) -> str: + return "NA" + + def __bool__(self): + raise TypeError("boolean value of NA is ambiguous") + + def __hash__(self): + return id(self) + + # Binary arithmetic and comparison ops -> propagate + + __add__ = _create_binary_propagating_op("__add__") + __radd__ = _create_binary_propagating_op("__radd__") + __sub__ = _create_binary_propagating_op("__sub__") + __rsub__ = _create_binary_propagating_op("__rsub__") + __mul__ = _create_binary_propagating_op("__mul__") + __rmul__ = _create_binary_propagating_op("__rmul__") + __matmul__ = _create_binary_propagating_op("__matmul__") + __rmatmul__ = _create_binary_propagating_op("__rmatmul__") + __truediv__ = _create_binary_propagating_op("__truediv__") + __rtruediv__ = _create_binary_propagating_op("__rtruediv__") + __floordiv__ = _create_binary_propagating_op("__floordiv__") + __rfloordiv__ = _create_binary_propagating_op("__rfloordiv__") + __mod__ = _create_binary_propagating_op("__mod__") + __rmod__ = _create_binary_propagating_op("__rmod__") + __divmod__ = _create_binary_propagating_op("__divmod__", divmod=True) + __rdivmod__ = _create_binary_propagating_op("__rdivmod__", divmod=True) + __pow__ = _create_binary_propagating_op("__pow__") + __rpow__ = _create_binary_propagating_op("__rpow__") + # __lshift__ and __rshift__ are not implemented + + __eq__ = _create_binary_propagating_op("__eq__") + __ne__ = _create_binary_propagating_op("__ne__") + __le__ = _create_binary_propagating_op("__le__") + __lt__ = _create_binary_propagating_op("__lt__") + __gt__ = _create_binary_propagating_op("__gt__") + __ge__ = _create_binary_propagating_op("__ge__") + + # Unary ops + + __neg__ = _create_unary_propagating_op("__neg__") + __pos__ = _create_unary_propagating_op("__pos__") + __abs__ = _create_unary_propagating_op("__abs__") + __invert__ = _create_unary_propagating_op("__invert__") + + # Logical ops using Kleene logic + + def __and__(self, other): + if other is False: + return False + elif other is True or other is C_NA: + return NA + else: + return NotImplemented + + __rand__ = __and__ + + def __or__(self, other): + if other is True: + return True + elif other is False or other is C_NA: + return NA + else: + return NotImplemented + + __ror__ = __or__ + + def __xor__(self, other): + if other is False or other is True or other is C_NA: + return NA + return NotImplemented + + __rxor__ = __xor__ + + +C_NA = NAType() # C-visible +NA = C_NA # Python-visible diff --git a/pandas/_libs/testing.pyx b/pandas/_libs/testing.pyx index 141735a97938a..8b847350cb1ff 100644 --- a/pandas/_libs/testing.pyx +++ b/pandas/_libs/testing.pyx @@ -180,13 +180,15 @@ cpdef assert_almost_equal(a, b, # classes can't be the same, to raise error assert_class_equal(a, b, obj=obj) - if a == b: - # object comparison - return True if isna(a) and isna(b): # TODO: Should require same-dtype NA? # nan / None comparison return True + + if a == b: + # object comparison + return True + if is_comparable_as_number(a) and is_comparable_as_number(b): if array_equivalent(a, b, strict_nan=True): # inf comparison diff --git a/pandas/core/api.py b/pandas/core/api.py index 65f0178b19187..bf701c0318874 100644 --- a/pandas/core/api.py +++ b/pandas/core/api.py @@ -55,3 +55,5 @@ # DataFrame needs to be imported after NamedAgg to avoid a circular import from pandas.core.frame import DataFrame # isort:skip + +from pandas._libs.missing import NA diff --git a/pandas/core/arrays/numpy_.py b/pandas/core/arrays/numpy_.py index 6f2bb095a014d..8ba5cd7565850 100644 --- a/pandas/core/arrays/numpy_.py +++ b/pandas/core/arrays/numpy_.py @@ -278,6 +278,9 @@ def fillna(self, value=None, method=None, limit=None): return new_values def take(self, indices, allow_fill=False, fill_value=None): + if fill_value is None: + # Primarily for subclasses + fill_value = self.dtype.na_value result = take( self._ndarray, indices, allow_fill=allow_fill, fill_value=fill_value ) diff --git a/pandas/core/arrays/string_.py b/pandas/core/arrays/string_.py index 8599b5e39f34a..f6af05ab4d9e7 100644 --- a/pandas/core/arrays/string_.py +++ b/pandas/core/arrays/string_.py @@ -1,9 +1,9 @@ import operator -from typing import TYPE_CHECKING, Type +from typing import Type import numpy as np -from pandas._libs import lib +from pandas._libs import lib, missing as libmissing from pandas.core.dtypes.base import ExtensionDtype from pandas.core.dtypes.common import pandas_dtype @@ -17,9 +17,6 @@ from pandas.core.construction import extract_array from pandas.core.missing import isna -if TYPE_CHECKING: - from pandas._typing import Scalar - @register_extension_dtype class StringDtype(ExtensionDtype): @@ -50,16 +47,8 @@ class StringDtype(ExtensionDtype): StringDtype """ - @property - def na_value(self) -> "Scalar": - """ - StringDtype uses :attr:`numpy.nan` as the missing NA value. - - .. warning:: - - `na_value` may change in a future release. - """ - return np.nan + #: StringDtype.na_value uses pandas.NA + na_value = libmissing.NA @property def type(self) -> Type: @@ -149,7 +138,7 @@ class StringArray(PandasArray): -------- >>> pd.array(['This is', 'some text', None, 'data.'], dtype="string") - ['This is', 'some text', nan, 'data.'] + ['This is', 'some text', NA, 'data.'] Length: 4, dtype: string Unlike ``object`` dtype arrays, ``StringArray`` doesn't allow non-string @@ -190,10 +179,10 @@ def _from_sequence(cls, scalars, dtype=None, copy=False): if dtype: assert dtype == "string" result = super()._from_sequence(scalars, dtype=object, copy=copy) - # convert None to np.nan + # Standardize all missing-like values to NA # TODO: it would be nice to do this in _validate / lib.is_string_array # We are already doing a scan over the values there. - result[result.isna()] = np.nan + result[result.isna()] = StringDtype.na_value return result @classmethod @@ -210,6 +199,12 @@ def __arrow_array__(self, type=None): type = pa.string() return pa.array(self._ndarray, type=type, from_pandas=True) + def _values_for_factorize(self): + arr = self._ndarray.copy() + mask = self.isna() + arr[mask] = -1 + return arr, -1 + def __setitem__(self, key, value): value = extract_array(value, extract_numpy=True) if isinstance(value, type(self)): @@ -223,9 +218,9 @@ def __setitem__(self, key, value): # validate new items if scalar_value: - if scalar_value is None: - value = np.nan - elif not (isinstance(value, str) or np.isnan(value)): + if isna(value): + value = StringDtype.na_value + elif not isinstance(value, str): raise ValueError( "Cannot set non-string value '{}' into a StringArray.".format(value) ) @@ -283,7 +278,7 @@ def method(self, other): other = other[valid] result = np.empty_like(self._ndarray, dtype="object") - result[mask] = np.nan + result[mask] = StringDtype.na_value result[valid] = op(self._ndarray[valid], other) if op.__name__ in {"add", "radd", "mul", "rmul"}: diff --git a/pandas/core/dtypes/missing.py b/pandas/core/dtypes/missing.py index cb4199272f574..efe00d4c8c176 100644 --- a/pandas/core/dtypes/missing.py +++ b/pandas/core/dtypes/missing.py @@ -80,6 +80,9 @@ def isna(obj): >>> pd.isna('dog') False + >>> pd.isna(pd.NA) + True + >>> pd.isna(np.nan) True @@ -327,6 +330,9 @@ def notna(obj): >>> pd.notna('dog') True + >>> pd.notna(pd.NA) + False + >>> pd.notna(np.nan) False @@ -444,6 +450,9 @@ def array_equivalent(left, right, strict_nan: bool = False) -> bool: if left_value is NaT and right_value is not NaT: return False + elif left_value is libmissing.NA and right_value is not libmissing.NA: + return False + elif isinstance(left_value, float) and np.isnan(left_value): if not isinstance(right_value, float) or not np.isnan(right_value): return False @@ -455,6 +464,8 @@ def array_equivalent(left, right, strict_nan: bool = False) -> bool: if "Cannot compare tz-naive" in str(err): # tzawareness compat failure, see GH#28507 return False + elif "boolean value of NA is ambiguous" in str(err): + return False raise return True diff --git a/pandas/io/formats/format.py b/pandas/io/formats/format.py index b18f0db622b3e..bfc0618e35e32 100644 --- a/pandas/io/formats/format.py +++ b/pandas/io/formats/format.py @@ -35,6 +35,7 @@ from pandas._config.config import get_option, set_option from pandas._libs import lib +from pandas._libs.missing import NA from pandas._libs.tslib import format_array_from_datetime from pandas._libs.tslibs import NaT, Timedelta, Timestamp, iNaT from pandas._libs.tslibs.nattype import NaTType @@ -1223,6 +1224,8 @@ def _format(x): # determine na_rep if x is None or NaT-like if x is None: return "None" + elif x is NA: + return "NA" elif x is NaT or np.isnat(x): return "NaT" except (TypeError, ValueError): diff --git a/pandas/tests/api/test_api.py b/pandas/tests/api/test_api.py index 85e38d58a6c57..3c0abd7fca830 100644 --- a/pandas/tests/api/test_api.py +++ b/pandas/tests/api/test_api.py @@ -46,7 +46,7 @@ class TestPDApi(Base): deprecated_modules: List[str] = [] # misc - misc = ["IndexSlice", "NaT"] + misc = ["IndexSlice", "NaT", "NA"] # top-level classes classes = [ diff --git a/pandas/tests/arrays/string_/test_string.py b/pandas/tests/arrays/string_/test_string.py index 1ce62d8f8b3d9..0dfd75a2042b0 100644 --- a/pandas/tests/arrays/string_/test_string.py +++ b/pandas/tests/arrays/string_/test_string.py @@ -9,10 +9,20 @@ import pandas.util.testing as tm +def test_repr_with_NA(): + a = pd.array(["a", pd.NA, "b"], dtype="string") + for obj in [a, pd.Series(a), pd.DataFrame({"a": a})]: + assert "NA" in repr(obj) and "NaN" not in repr(obj) + assert "NA" in str(obj) and "NaN" not in str(obj) + if hasattr(obj, "_repr_html_"): + html_repr = obj._repr_html_() + assert "NA" in html_repr and "NaN" not in html_repr + + def test_none_to_nan(): a = pd.arrays.StringArray._from_sequence(["a", None, "b"]) assert a[1] is not None - assert np.isnan(a[1]) + assert a[1] is pd.NA def test_setitem_validates(): @@ -24,6 +34,15 @@ def test_setitem_validates(): a[:] = np.array([1, 2]) +def test_setitem_with_scalar_string(): + # is_float_dtype considers some strings, like 'd', to be floats + # which can cause issues. + arr = pd.array(["a", "c"], dtype="string") + arr[0] = "d" + expected = pd.array(["d", "c"], dtype="string") + tm.assert_extension_array_equal(arr, expected) + + @pytest.mark.parametrize( "input, method", [ diff --git a/pandas/tests/extension/test_string.py b/pandas/tests/extension/test_string.py index 5b872d5b72227..471a1b79d23bc 100644 --- a/pandas/tests/extension/test_string.py +++ b/pandas/tests/extension/test_string.py @@ -25,7 +25,7 @@ def data(): @pytest.fixture def data_missing(): """Length 2 array with [NA, Valid]""" - return StringArray._from_sequence([np.nan, "A"]) + return StringArray._from_sequence([pd.NA, "A"]) @pytest.fixture @@ -35,17 +35,17 @@ def data_for_sorting(): @pytest.fixture def data_missing_for_sorting(): - return StringArray._from_sequence(["B", np.nan, "A"]) + return StringArray._from_sequence(["B", pd.NA, "A"]) @pytest.fixture def na_value(): - return np.nan + return pd.NA @pytest.fixture def data_for_grouping(): - return StringArray._from_sequence(["B", "B", np.nan, np.nan, "A", "A", "B", "C"]) + return StringArray._from_sequence(["B", "B", pd.NA, pd.NA, "A", "A", "B", "C"]) class TestDtype(base.BaseDtypeTests): diff --git a/pandas/tests/scalar/test_na_scalar.py b/pandas/tests/scalar/test_na_scalar.py new file mode 100644 index 0000000000000..e68e49814245f --- /dev/null +++ b/pandas/tests/scalar/test_na_scalar.py @@ -0,0 +1,131 @@ +import numpy as np +import pytest + +from pandas._libs.missing import NA + +from pandas.core.dtypes.common import is_scalar + +import pandas as pd +import pandas.util.testing as tm + + +def test_singleton(): + assert NA is NA + new_NA = type(NA)() + assert new_NA is NA + + +def test_repr(): + assert repr(NA) == "NA" + assert str(NA) == "NA" + + +def test_truthiness(): + with pytest.raises(TypeError): + bool(NA) + + with pytest.raises(TypeError): + not NA + + +def test_hashable(): + assert hash(NA) == hash(NA) + d = {NA: "test"} + assert d[NA] == "test" + + +def test_arithmetic_ops(all_arithmetic_functions): + op = all_arithmetic_functions + + for other in [NA, 1, 1.0, "a", np.int64(1), np.nan]: + if op.__name__ == "rmod" and isinstance(other, str): + continue + if op.__name__ in ("divmod", "rdivmod"): + assert op(NA, other) is (NA, NA) + else: + assert op(NA, other) is NA + + +def test_comparison_ops(): + + for other in [NA, 1, 1.0, "a", np.int64(1), np.nan]: + assert (NA == other) is NA + assert (NA != other) is NA + assert (NA > other) is NA + assert (NA >= other) is NA + assert (NA < other) is NA + assert (NA <= other) is NA + + if isinstance(other, np.int64): + # for numpy scalars we get a deprecation warning and False as result + # for equality or error for larger/lesser than + continue + + assert (other == NA) is NA + assert (other != NA) is NA + assert (other > NA) is NA + assert (other >= NA) is NA + assert (other < NA) is NA + assert (other <= NA) is NA + + +def test_unary_ops(): + assert +NA is NA + assert -NA is NA + assert abs(NA) is NA + assert ~NA is NA + + +def test_logical_and(): + + assert NA & True is NA + assert True & NA is NA + assert NA & False is False + assert False & NA is False + assert NA & NA is NA + + with pytest.raises(TypeError): + NA & 5 + + +def test_logical_or(): + + assert NA | True is True + assert True | NA is True + assert NA | False is NA + assert False | NA is NA + assert NA | NA is NA + + with pytest.raises(TypeError): + NA | 5 + + +def test_logical_xor(): + + assert NA ^ True is NA + assert True ^ NA is NA + assert NA ^ False is NA + assert False ^ NA is NA + assert NA ^ NA is NA + + with pytest.raises(TypeError): + NA ^ 5 + + +def test_logical_not(): + assert ~NA is NA + + +def test_is_scalar(): + assert is_scalar(NA) is True + + +def test_isna(): + assert pd.isna(NA) is True + assert pd.notna(NA) is False + + +def test_series_isna(): + s = pd.Series([1, NA], dtype=object) + expected = pd.Series([False, True]) + tm.assert_series_equal(s.isna(), expected)