Skip to content

Commit 844dc4a

Browse files
TomAugspurgerjreback
authored andcommitted
API: Uses pd.NA in IntegerArray (#29964)
1 parent 9c40e06 commit 844dc4a

File tree

7 files changed

+298
-88
lines changed

7 files changed

+298
-88
lines changed

doc/source/user_guide/integer_na.rst

+28
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,10 @@ Nullable integer data type
1515
IntegerArray is currently experimental. Its API or implementation may
1616
change without warning.
1717

18+
.. versionchanged:: 1.0.0
19+
20+
Now uses :attr:`pandas.NA` as the missing value rather
21+
than :attr:`numpy.nan`.
1822

1923
In :ref:`missing_data`, we saw that pandas primarily uses ``NaN`` to represent
2024
missing data. Because ``NaN`` is a float, this forces an array of integers with
@@ -23,6 +27,9 @@ much. But if your integer column is, say, an identifier, casting to float can
2327
be problematic. Some integers cannot even be represented as floating point
2428
numbers.
2529

30+
Construction
31+
------------
32+
2633
Pandas can represent integer data with possibly missing values using
2734
:class:`arrays.IntegerArray`. This is an :ref:`extension types <extending.extension-types>`
2835
implemented within pandas.
@@ -39,6 +46,12 @@ NumPy's ``'int64'`` dtype:
3946
4047
pd.array([1, 2, np.nan], dtype="Int64")
4148
49+
All NA-like values are replaced with :attr:`pandas.NA`.
50+
51+
.. ipython:: python
52+
53+
pd.array([1, 2, np.nan, None, pd.NA], dtype="Int64")
54+
4255
This array can be stored in a :class:`DataFrame` or :class:`Series` like any
4356
NumPy array.
4457

@@ -78,6 +91,9 @@ with the dtype.
7891
In the future, we may provide an option for :class:`Series` to infer a
7992
nullable-integer dtype.
8093

94+
Operations
95+
----------
96+
8197
Operations involving an integer array will behave similar to NumPy arrays.
8298
Missing values will be propagated, and the data will be coerced to another
8399
dtype if needed.
@@ -123,3 +139,15 @@ Reduction and groupby operations such as 'sum' work as well.
123139
124140
df.sum()
125141
df.groupby('B').A.sum()
142+
143+
Scalar NA Value
144+
---------------
145+
146+
:class:`arrays.IntegerArray` uses :attr:`pandas.NA` as its scalar
147+
missing value. Slicing a single element that's missing will return
148+
:attr:`pandas.NA`
149+
150+
.. ipython:: python
151+
152+
a = pd.array([1, None], dtype="Int64")
153+
a[1]

doc/source/whatsnew/v1.0.0.rst

+58
Original file line numberDiff line numberDiff line change
@@ -365,6 +365,64 @@ The following methods now also correctly output values for unobserved categories
365365
366366
As a reminder, you can specify the ``dtype`` to disable all inference.
367367

368+
:class:`arrays.IntegerArray` now uses :attr:`pandas.NA`
369+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
370+
371+
:class:`arrays.IntegerArray` now uses :attr:`pandas.NA` rather than
372+
:attr:`numpy.nan` as its missing value marker (:issue:`29964`).
373+
374+
*pandas 0.25.x*
375+
376+
.. code-block:: python
377+
378+
>>> a = pd.array([1, 2, None], dtype="Int64")
379+
>>> a
380+
<IntegerArray>
381+
[1, 2, NaN]
382+
Length: 3, dtype: Int64
383+
384+
>>> a[2]
385+
nan
386+
387+
*pandas 1.0.0*
388+
389+
.. ipython:: python
390+
391+
a = pd.array([1, 2, None], dtype="Int64")
392+
a[2]
393+
394+
See :ref:`missing_data.NA` for more on the differences between :attr:`pandas.NA`
395+
and :attr:`numpy.nan`.
396+
397+
:class:`arrays.IntegerArray` comparisons return :class:`arrays.BooleanArray`
398+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
399+
400+
Comparison operations on a :class:`arrays.IntegerArray` now returns a
401+
:class:`arrays.BooleanArray` rather than a NumPy array (:issue:`29964`).
402+
403+
*pandas 0.25.x*
404+
405+
.. code-block:: python
406+
407+
>>> a = pd.array([1, 2, None], dtype="Int64")
408+
>>> a
409+
<IntegerArray>
410+
[1, 2, NaN]
411+
Length: 3, dtype: Int64
412+
413+
>>> a > 1
414+
array([False, True, False])
415+
416+
*pandas 1.0.0*
417+
418+
.. ipython:: python
419+
420+
a = pd.array([1, 2, None], dtype="Int64")
421+
a > 1
422+
423+
Note that missing values now propagate, rather than always comparing unequal
424+
like :attr:`numpy.nan`. See :ref:`missing_data.NA` for more.
425+
368426
By default :meth:`Categorical.min` now returns the minimum instead of np.nan
369427
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
370428

pandas/core/arrays/boolean.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -730,7 +730,6 @@ def all(self, skipna: bool = True, **kwargs):
730730
@classmethod
731731
def _create_logical_method(cls, op):
732732
def logical_method(self, other):
733-
734733
if isinstance(other, (ABCDataFrame, ABCSeries, ABCIndexClass)):
735734
# Rely on pandas to unbox and dispatch to us.
736735
return NotImplemented
@@ -777,8 +776,11 @@ def logical_method(self, other):
777776
@classmethod
778777
def _create_comparison_method(cls, op):
779778
def cmp_method(self, other):
779+
from pandas.arrays import IntegerArray
780780

781-
if isinstance(other, (ABCDataFrame, ABCSeries, ABCIndexClass)):
781+
if isinstance(
782+
other, (ABCDataFrame, ABCSeries, ABCIndexClass, IntegerArray)
783+
):
782784
# Rely on pandas to unbox and dispatch to us.
783785
return NotImplemented
784786

0 commit comments

Comments
 (0)