Skip to content

Commit aee2914

Browse files
committed
ENH: add integer-na support via an ExtensionArray
closes pandas-dev#20700 closes pandas-dev#20747
1 parent 486bfe8 commit aee2914

File tree

21 files changed

+1386
-67
lines changed

21 files changed

+1386
-67
lines changed

doc/source/whatsnew/v0.24.0.txt

+57
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ v0.24.0 (Month XX, 2018)
88
New features
99
~~~~~~~~~~~~
1010

11+
1112
- ``ExcelWriter`` now accepts ``mode`` as a keyword argument, enabling append to existing workbooks when using the ``openpyxl`` engine (:issue:`3441`)
1213

1314
.. _whatsnew_0240.enhancements.extension_array_operators:
@@ -26,6 +27,61 @@ See the :ref:`ExtensionArray Operator Support
2627
<extending.extension.operator>` documentation section for details on both
2728
ways of adding operator support.
2829

30+
.. _whatsnew_0240.enhancements.intna:
31+
32+
Integer NA Support
33+
^^^^^^^^^^^^^^^^^^
34+
35+
Pandas has gained the ability to hold integer dtypes with missing values. This long requested feature is enabled thru the use of ``ExtensionTypes`` . Here is an example of the usage.
36+
37+
We can construct a ``Series`` with the specified dtype. The dtype string ``Int64`` is a pandas ``ExtensionDtype``. Specifying an list or array using the traditional missing value
38+
marker of ``np.nan`` will infer to integer dtype. The display of the ``Series`` will also use the ``NaN`` to indicate missing values in string outputs. (:issue:`20700`, :issue:`20747`)
39+
40+
.. ipython:: python
41+
42+
s = pd.Series([1, 2, np.nan], dtype='Int64')
43+
s
44+
45+
46+
Operations on these dtypes will propagate ``NaN`` as other pandas operations.
47+
48+
.. ipython:: python
49+
50+
# arithmetic
51+
s + 1
52+
53+
# comparison
54+
s == 1
55+
56+
# indexing
57+
s.iloc[1:3]
58+
59+
# operate with other dtypes
60+
s + s.iloc[1:3]
61+
62+
# coerce when needed
63+
s + 0.01
64+
65+
These dtypes can operate as part of ``DataFrames``.
66+
67+
.. ipython:: python
68+
69+
df = pd.DataFrame({'A': s, 'B': [1, 1, 3], 'C': list('aab')})
70+
df
71+
df.dtypes
72+
73+
74+
These dtypes can be merged & reshaped & casted.
75+
76+
.. ipython:: python
77+
78+
pd.concat([df[['A']], df[['B', 'C']]], axis=1).dtypes
79+
df['A'].astype(float)
80+
81+
.. warning::
82+
83+
The Integer NA support currently uses the captilized dtype version, e.g. ``Int8`` as compared to the traditional ``int8``. This maybe changed at a future date.
84+
2985
.. _whatsnew_0240.enhancements.read_html:
3086

3187
``read_html`` Enhancements
@@ -182,6 +238,7 @@ Previous Behavior:
182238
ExtensionType Changes
183239
^^^^^^^^^^^^^^^^^^^^^
184240

241+
- ``ExtensionArray`` has gained the abstract methods ``.dropna()`` (:issue:`21185`)
185242
- ``ExtensionDtype`` has gained the ability to instantiate from string dtypes, e.g. ``decimal`` would instantiate a registered ``DecimalDtype``; furthermore
186243
the ``ExtensionDtype`` has gained the method ``construct_array_type`` (:issue:`21185`)
187244
- The ``ExtensionArray`` constructor, ``_from_sequence`` now take the keyword arg ``copy=False`` (:issue:`21185`)

pandas/core/arrays/__init__.py

+3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
from .base import (ExtensionArray, # noqa
2+
ExtensionOpsMixin,
23
ExtensionScalarOpsMixin)
34
from .categorical import Categorical # noqa
45
from .datetimes import DatetimeArrayMixin # noqa
56
from .period import PeriodArrayMixin # noqa
67
from .timedelta import TimedeltaArrayMixin # noqa
8+
from .integer import ( # noqa
9+
IntegerArray, to_integer_array)

pandas/core/arrays/base.py

+7-5
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@
1212
from pandas.errors import AbstractMethodError
1313
from pandas.compat.numpy import function as nv
1414
from pandas.compat import set_function_name, PY3
15-
from pandas.core.dtypes.common import is_list_like
1615
from pandas.core import ops
16+
from pandas.core.dtypes.common import is_list_like
1717

1818
_not_implemented_message = "{} does not implement {}."
1919

@@ -88,14 +88,16 @@ class ExtensionArray(object):
8888
# Constructors
8989
# ------------------------------------------------------------------------
9090
@classmethod
91-
def _from_sequence(cls, scalars, copy=False):
91+
def _from_sequence(cls, scalars, dtype=None, copy=False):
9292
"""Construct a new ExtensionArray from a sequence of scalars.
9393
9494
Parameters
9595
----------
9696
scalars : Sequence
9797
Each element will be an instance of the scalar type for this
9898
array, ``cls.dtype.type``.
99+
dtype : Dtype, optional
100+
consruct for this particular dtype
99101
copy : boolean, default False
100102
if True, copy the underlying data
101103
Returns
@@ -378,7 +380,7 @@ def fillna(self, value=None, method=None, limit=None):
378380
func = pad_1d if method == 'pad' else backfill_1d
379381
new_values = func(self.astype(object), limit=limit,
380382
mask=mask)
381-
new_values = self._from_sequence(new_values)
383+
new_values = self._from_sequence(new_values, dtype=self.dtype)
382384
else:
383385
# fill with value
384386
new_values = self.copy()
@@ -407,7 +409,7 @@ def unique(self):
407409
from pandas import unique
408410

409411
uniques = unique(self.astype(object))
410-
return self._from_sequence(uniques)
412+
return self._from_sequence(uniques, dtype=self.dtype)
411413

412414
def _values_for_factorize(self):
413415
# type: () -> Tuple[ndarray, Any]
@@ -559,7 +561,7 @@ def take(self, indices, allow_fill=False, fill_value=None):
559561
560562
result = take(data, indices, fill_value=fill_value,
561563
allow_fill=allow_fill)
562-
return self._from_sequence(result)
564+
return self._from_sequence(result, dtype=self.dtype)
563565
"""
564566
# Implementer note: The `fill_value` parameter should be a user-facing
565567
# value, an instance of self.dtype.type. When passed `fill_value=None`,

pandas/core/arrays/categorical.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -487,8 +487,8 @@ def _constructor(self):
487487
return Categorical
488488

489489
@classmethod
490-
def _from_sequence(cls, scalars):
491-
return Categorical(scalars)
490+
def _from_sequence(cls, scalars, dtype=None, copy=False):
491+
return Categorical(scalars, dtype=dtype)
492492

493493
def copy(self):
494494
""" Copy constructor. """

0 commit comments

Comments
 (0)