Skip to content

Commit 610a19a

Browse files
REF: IntervalIndex[IntervalArray] (#20611)
Co-authored-by: Jeremy Schendel <[email protected]>
1 parent 365eac4 commit 610a19a

26 files changed

+1643
-612
lines changed

doc/source/basics.rst

+18-5
Original file line numberDiff line numberDiff line change
@@ -1924,11 +1924,24 @@ untouched. If the data is modified, it is because you did so explicitly.
19241924
dtypes
19251925
------
19261926

1927-
The main types stored in pandas objects are ``float``, ``int``, ``bool``,
1928-
``datetime64[ns]`` and ``datetime64[ns, tz]``, ``timedelta[ns]``,
1929-
``category`` and ``object``. In addition these dtypes have item sizes, e.g.
1930-
``int64`` and ``int32``. See :ref:`Series with TZ <timeseries.timezone_series>`
1931-
for more detail on ``datetime64[ns, tz]`` dtypes.
1927+
For the most part, pandas uses NumPy arrays and dtypes for Series or individual
1928+
columns of a DataFrame. The main types allowed in pandas objects are ``float``,
1929+
``int``, ``bool``, and ``datetime64[ns]`` (note that NumPy does not support
1930+
timezone-aware datetimes).
1931+
1932+
In addition to NumPy's types, pandas :ref:`extends <extending.extension-types>`
1933+
NumPy's type-system for a few cases.
1934+
1935+
* :ref:`Categorical <categorical>`
1936+
* :ref:`Datetime with Timezone <timeseries.timezone_series>`
1937+
* :ref:`Period <timeseries.periods>`
1938+
* :ref:`Interval <advanced.indexing.intervallindex>`
1939+
1940+
Pandas uses the ``object`` dtype for storing strings.
1941+
1942+
Finally, arbitrary objects may be stored using the ``object`` dtype, but should
1943+
be avoided to the extent possible (for performance and interoperability with
1944+
other libraries and methods. See :ref:`basics.object_conversion`).
19321945

19331946
A convenient :attr:`~DataFrame.dtypes` attribute for DataFrame returns a Series
19341947
with the data type of each column.

doc/source/whatsnew/v0.24.0.txt

+70
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,36 @@ Current Behavior:
6666

6767
result
6868

69+
70+
.. _whatsnew_0240.enhancements.interval:
71+
72+
Storing Interval Data in Series and DataFrame
73+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
74+
75+
Interval data may now be stored in a ``Series`` or ``DataFrame``, in addition to an
76+
:class:`IntervalIndex` like previously (:issue:`19453`).
77+
78+
.. ipython:: python
79+
80+
ser = pd.Series(pd.interval_range(0, 5))
81+
ser
82+
ser.dtype
83+
84+
Previously, these would be cast to a NumPy array of ``Interval`` objects. In general,
85+
this should result in better performance when storing an array of intervals in
86+
a :class:`Series`.
87+
88+
Note that the ``.values`` of a ``Series`` containing intervals is no longer a NumPy
89+
array, but rather an ``ExtensionArray``:
90+
91+
.. ipython:: python
92+
93+
ser.values
94+
95+
This is the same behavior as ``Series.values`` for categorical data. See
96+
:ref:`whatsnew_0240.api_breaking.interval_values` for more.
97+
98+
6999
.. _whatsnew_0240.enhancements.other:
70100

71101
Other Enhancements
@@ -91,6 +121,45 @@ Other Enhancements
91121
Backwards incompatible API changes
92122
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
93123

124+
125+
.. _whatsnew_0240.api_breaking.interval_values:
126+
127+
``IntervalIndex.values`` is now an ``IntervalArray``
128+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
129+
130+
The :attr:`~Interval.values` attribute of an :class:`IntervalIndex` now returns an
131+
``IntervalArray``, rather than a NumPy array of :class:`Interval` objects (:issue:`19453`).
132+
133+
Previous Behavior:
134+
135+
.. code-block:: ipython
136+
137+
In [1]: idx = pd.interval_range(0, 4)
138+
139+
In [2]: idx.values
140+
Out[2]:
141+
array([Interval(0, 1, closed='right'), Interval(1, 2, closed='right'),
142+
Interval(2, 3, closed='right'), Interval(3, 4, closed='right')],
143+
dtype=object)
144+
145+
New Behavior:
146+
147+
.. ipython:: python
148+
149+
idx = pd.interval_range(0, 4)
150+
idx.values
151+
152+
This mirrors ``CateogricalIndex.values``, which returns a ``Categorical``.
153+
154+
For situations where you need an ``ndarray`` of ``Interval`` objects, use
155+
:meth:`numpy.asarray` or ``idx.astype(object)``.
156+
157+
.. ipython:: python
158+
159+
np.asarray(idx)
160+
idx.values.astype(object)
161+
162+
94163
.. _whatsnew_0240.api.datetimelike.normalize:
95164

96165
Tick DateOffset Normalize Restrictions
@@ -350,6 +419,7 @@ Interval
350419
^^^^^^^^
351420

352421
- Bug in the :class:`IntervalIndex` constructor where the ``closed`` parameter did not always override the inferred ``closed`` (:issue:`19370`)
422+
- Bug in the ``IntervalIndex`` repr where a trailing comma was missing after the list of intervals (:issue:`20611`)
353423
-
354424
-
355425

pandas/_libs/interval.pyx

+20
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,26 @@ cdef class IntervalMixin(object):
9898
msg = 'cannot compute length between {left!r} and {right!r}'
9999
raise TypeError(msg.format(left=self.left, right=self.right))
100100

101+
def _check_closed_matches(self, other, name='other'):
102+
"""Check if the closed attribute of `other` matches.
103+
104+
Note that 'left' and 'right' are considered different from 'both'.
105+
106+
Parameters
107+
----------
108+
other : Interval, IntervalIndex, IntervalArray
109+
name : str
110+
Name to use for 'other' in the error message.
111+
112+
Raises
113+
------
114+
ValueError
115+
When `other` is not closed exactly the same as self.
116+
"""
117+
if self.closed != other.closed:
118+
msg = "'{}.closed' is '{}', expected '{}'."
119+
raise ValueError(msg.format(name, other.closed, self.closed))
120+
101121

102122
cdef _interval_like(other):
103123
return (hasattr(other, 'left')

pandas/core/arrays/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@
22
ExtensionScalarOpsMixin)
33
from .categorical import Categorical # noqa
44
from .datetimes import DatetimeArrayMixin # noqa
5+
from .interval import IntervalArray # noqa
56
from .period import PeriodArrayMixin # noqa
67
from .timedelta import TimedeltaArrayMixin # noqa

pandas/core/arrays/categorical.py

+6
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
_ensure_int64,
2121
_ensure_object,
2222
_ensure_platform_int,
23+
is_extension_array_dtype,
2324
is_dtype_equal,
2425
is_datetimelike,
2526
is_datetime64_dtype,
@@ -1243,6 +1244,11 @@ def __array__(self, dtype=None):
12431244
ret = take_1d(self.categories.values, self._codes)
12441245
if dtype and not is_dtype_equal(dtype, self.categories.dtype):
12451246
return np.asarray(ret, dtype)
1247+
if is_extension_array_dtype(ret):
1248+
# When we're a Categorical[ExtensionArray], like Interval,
1249+
# we need to ensure __array__ get's all the way to an
1250+
# ndarray.
1251+
ret = np.asarray(ret)
12461252
return ret
12471253

12481254
def __setstate__(self, state):

0 commit comments

Comments
 (0)