Skip to content

Commit 9b8564f

Browse files
committed
REF: IntervalIndex[IntervalArray]
Closes pandas-dev#19209
1 parent 6eda77e commit 9b8564f

21 files changed

+1295
-582
lines changed

doc/source/basics.rst

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1925,11 +1925,23 @@ untouched. If the data is modified, it is because you did so explicitly.
19251925
dtypes
19261926
------
19271927

1928-
The main types stored in pandas objects are ``float``, ``int``, ``bool``,
1929-
``datetime64[ns]`` and ``datetime64[ns, tz]``, ``timedelta[ns]``,
1930-
``category`` and ``object``. In addition these dtypes have item sizes, e.g.
1931-
``int64`` and ``int32``. See :ref:`Series with TZ <timeseries.timezone_series>`
1932-
for more detail on ``datetime64[ns, tz]`` dtypes.
1928+
For the most part, pandas uses NumPy arrays and dtypes for Series or individual
1929+
columns of a DataFrame. The main types allowed in pandas objects are ``float``,
1930+
``int``, ``bool``, and ``datetime64[ns]`` (note that NumPy does not support
1931+
timezone-aware datetimes).
1932+
1933+
In addition to NumPy's types, pandas :ref:`extends <extending.extension-types>`
1934+
NumPy's type-system for a few cases.
1935+
1936+
* :ref:`Categorical <categorical>`
1937+
* :ref:`Datetime with Timezone <timeseries.timezone_series>`
1938+
* Interval
1939+
1940+
Pandas uses the ``object`` dtype for storing strings.
1941+
1942+
Finally, arbitrary objects may be stored using the ``object`` dtype, but should
1943+
be avoided to the extent possible (for performance and interoperability with
1944+
other libraries and methods. See :ref:`basics.object_conversion`).
19331945

19341946
A convenient :attr:`~DataFrame.dtypes` attribute for DataFrame returns a Series
19351947
with the data type of each column.

doc/source/whatsnew/v0.23.0.txt

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -299,6 +299,41 @@ Supplying a ``CategoricalDtype`` will make the categories in each column consist
299299
df['A'].dtype
300300
df['B'].dtype
301301

302+
.. _whatsnew_023.enhancements.interval:
303+
304+
Storing Interval Data in Series and DataFrame
305+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
306+
307+
Interval data may now be stored in a Series or DataFrame, in addition to an
308+
:class:`IntervalIndex` like before.
309+
310+
.. ipython:: python
311+
312+
ser = pd.Series(pd.interval_range(0, 5))
313+
ser
314+
ser.dtype
315+
316+
Previously, these would be cast to a NumPy array of Interval objects. In general,
317+
this should result in better performance when storing an array of intervals in
318+
a Series.
319+
320+
Note that the ``.values`` of a Series containing intervals is no longer a NumPy
321+
array. Rather, it's an ``ExtensionArray``, composed of two arrays ``left`` and
322+
``right``.
323+
324+
.. ipython:: python
325+
326+
ser.values
327+
328+
To recover the NumPy array of Interval objects, use :func:`numpy.asarray`:
329+
330+
.. ipython:: python
331+
332+
np.asarray(ser.values)
333+
334+
This is the same behavior as ``Series.values`` for categorical data. See
335+
:ref:`whatsnew_0230.api_breaking.interval_values` for more.
336+
302337
.. _whatsnew_023.enhancements.extension:
303338

304339
Extending Pandas with Custom Types
@@ -479,6 +514,42 @@ If you wish to retain the old behavior while using Python >= 3.6, you can use
479514
'Taxes': -200,
480515
'Net result': 300}).sort_index()
481516

517+
.. _whatsnew_0230.api_breaking.interval_values:
518+
519+
``IntervalIndex.values`` is now an ``IntervalArray``
520+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
521+
522+
The ``.values`` attribute of an :class:`IntervalIndex` now returns an
523+
``IntervalArray``, rather than a NumPy array of :class:`Interval` objects.
524+
525+
Previous Behavior:
526+
527+
.. code-block:: ipython
528+
529+
In [1]: idx = pd.interval_range(0, 4)
530+
531+
In [2]: idx.values
532+
Out[2]:
533+
array([Interval(0, 1, closed='right'), Interval(1, 2, closed='right'),
534+
Interval(2, 3, closed='right'), Interval(3, 4, closed='right')],
535+
dtype=object)
536+
537+
New Behavior:
538+
539+
.. ipython:: python
540+
541+
idx = pd.interval_range(0, 4)
542+
idx.values
543+
544+
This mirrors ``CateogricalIndex.values``, which returns a ``Categorical``.
545+
546+
For situations where you need an ``ndarray`` of Interval objects, use
547+
:meth:`numpy.asarray` or ``idx.astype(object)``.
548+
549+
.. ipython:: python
550+
551+
idx.values.astype(object)
552+
482553
.. _whatsnew_0230.api_breaking.deprecate_panel:
483554

484555
Deprecate Panel

pandas/core/arrays/__init__.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,10 @@
1-
from .base import ExtensionArray # noqa
2-
from .categorical import Categorical # noqa
1+
from .base import ExtensionArray
2+
from .categorical import Categorical
3+
from .interval import IntervalArray
4+
5+
6+
__all__ = [
7+
'Categorical',
8+
'ExtensionArray',
9+
'IntervalArray',
10+
]

pandas/core/arrays/categorical.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
_ensure_int64,
2020
_ensure_object,
2121
_ensure_platform_int,
22+
is_extension_array_dtype,
2223
is_dtype_equal,
2324
is_datetimelike,
2425
is_datetime64_dtype,
@@ -1218,6 +1219,8 @@ def __array__(self, dtype=None):
12181219
ret = take_1d(self.categories.values, self._codes)
12191220
if dtype and not is_dtype_equal(dtype, self.categories.dtype):
12201221
return np.asarray(ret, dtype)
1222+
if is_extension_array_dtype(ret):
1223+
ret = np.asarray(ret)
12211224
return ret
12221225

12231226
def __setstate__(self, state):

0 commit comments

Comments
 (0)