Skip to content

Commit c407b73

Browse files
jschendeljreback
authored andcommitted
API: Implement new indexing behavior for intervals (#27100)
1 parent 8507170 commit c407b73

File tree

16 files changed

+494
-626
lines changed

16 files changed

+494
-626
lines changed

ci/code_checks.sh

+2-2
Original file line numberDiff line numberDiff line change
@@ -245,10 +245,10 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
245245
RET=$(($RET + $?)) ; echo $MSG "DONE"
246246

247247
MSG='Doctests interval classes' ; echo $MSG
248-
pytest --doctest-modules -v \
248+
pytest -q --doctest-modules \
249249
pandas/core/indexes/interval.py \
250250
pandas/core/arrays/interval.py \
251-
-k"-from_arrays -from_breaks -from_intervals -from_tuples -get_loc -set_closed -to_tuples -interval_range"
251+
-k"-from_arrays -from_breaks -from_intervals -from_tuples -set_closed -to_tuples -interval_range"
252252
RET=$(($RET + $?)) ; echo $MSG "DONE"
253253

254254
fi

doc/source/user_guide/advanced.rst

+28-3
Original file line numberDiff line numberDiff line change
@@ -938,9 +938,8 @@ for interval notation.
938938
The ``IntervalIndex`` allows some unique indexing and is also used as a
939939
return type for the categories in :func:`cut` and :func:`qcut`.
940940

941-
.. warning::
942-
943-
These indexing behaviors are provisional and may change in a future version of pandas.
941+
Indexing with an ``IntervalIndex``
942+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
944943

945944
An ``IntervalIndex`` can be used in ``Series`` and in ``DataFrame`` as the index.
946945

@@ -965,6 +964,32 @@ If you select a label *contained* within an interval, this will also select the
965964
df.loc[2.5]
966965
df.loc[[2.5, 3.5]]
967966
967+
Selecting using an ``Interval`` will only return exact matches (starting from pandas 0.25.0).
968+
969+
.. ipython:: python
970+
971+
df.loc[pd.Interval(1, 2)]
972+
973+
Trying to select an ``Interval`` that is not exactly contained in the ``IntervalIndex`` will raise a ``KeyError``.
974+
975+
.. code-block:: python
976+
977+
In [7]: df.loc[pd.Interval(0.5, 2.5)]
978+
---------------------------------------------------------------------------
979+
KeyError: Interval(0.5, 2.5, closed='right')
980+
981+
Selecting all ``Intervals`` that overlap a given ``Interval`` can be performed using the
982+
:meth:`~IntervalIndex.overlaps` method to create a boolean indexer.
983+
984+
.. ipython:: python
985+
986+
idxr = df.index.overlaps(pd.Interval(0.5, 2.5))
987+
idxr
988+
df[idxr]
989+
990+
Binning data with ``cut`` and ``qcut``
991+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
992+
968993
:func:`cut` and :func:`qcut` both return a ``Categorical`` object, and the bins they
969994
create are stored as an ``IntervalIndex`` in its ``.categories`` attribute.
970995

doc/source/whatsnew/v0.25.0.rst

+139-1
Original file line numberDiff line numberDiff line change
@@ -484,6 +484,142 @@ This change is backward compatible for direct usage of Pandas, but if you subcla
484484
Pandas objects *and* give your subclasses specific ``__str__``/``__repr__`` methods,
485485
you may have to adjust your ``__str__``/``__repr__`` methods (:issue:`26495`).
486486

487+
.. _whatsnew_0250.api_breaking.interval_indexing:
488+
489+
490+
Indexing an ``IntervalIndex`` with ``Interval`` objects
491+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
492+
493+
Indexing methods for :class:`IntervalIndex` have been modified to require exact matches only for :class:`Interval` queries.
494+
``IntervalIndex`` methods previously matched on any overlapping ``Interval``. Behavior with scalar points, e.g. querying
495+
with an integer, is unchanged (:issue:`16316`).
496+
497+
.. ipython:: python
498+
499+
ii = pd.IntervalIndex.from_tuples([(0, 4), (1, 5), (5, 8)])
500+
ii
501+
502+
The ``in`` operator (``__contains__``) now only returns ``True`` for exact matches to ``Intervals`` in the ``IntervalIndex``, whereas
503+
this would previously return ``True`` for any ``Interval`` overlapping an ``Interval`` in the ``IntervalIndex``.
504+
505+
*Previous behavior*:
506+
507+
.. code-block:: python
508+
509+
In [4]: pd.Interval(1, 2, closed='neither') in ii
510+
Out[4]: True
511+
512+
In [5]: pd.Interval(-10, 10, closed='both') in ii
513+
Out[5]: True
514+
515+
*New behavior*:
516+
517+
.. ipython:: python
518+
519+
pd.Interval(1, 2, closed='neither') in ii
520+
pd.Interval(-10, 10, closed='both') in ii
521+
522+
The :meth:`~IntervalIndex.get_loc` method now only returns locations for exact matches to ``Interval`` queries, as opposed to the previous behavior of
523+
returning locations for overlapping matches. A ``KeyError`` will be raised if an exact match is not found.
524+
525+
*Previous behavior*:
526+
527+
.. code-block:: python
528+
529+
In [6]: ii.get_loc(pd.Interval(1, 5))
530+
Out[6]: array([0, 1])
531+
532+
In [7]: ii.get_loc(pd.Interval(2, 6))
533+
Out[7]: array([0, 1, 2])
534+
535+
*New behavior*:
536+
537+
.. code-block:: python
538+
539+
In [6]: ii.get_loc(pd.Interval(1, 5))
540+
Out[6]: 1
541+
542+
In [7]: ii.get_loc(pd.Interval(2, 6))
543+
---------------------------------------------------------------------------
544+
KeyError: Interval(2, 6, closed='right')
545+
546+
Likewise, :meth:`~IntervalIndex.get_indexer` and :meth:`~IntervalIndex.get_indexer_non_unique` will also only return locations for exact matches
547+
to ``Interval`` queries, with ``-1`` denoting that an exact match was not found.
548+
549+
These indexing changes extend to querying a :class:`Series` or :class:`DataFrame` with an ``IntervalIndex`` index.
550+
551+
.. ipython:: python
552+
553+
s = pd.Series(list('abc'), index=ii)
554+
s
555+
556+
Selecting from a ``Series`` or ``DataFrame`` using ``[]`` (``__getitem__``) or ``loc`` now only returns exact matches for ``Interval`` queries.
557+
558+
*Previous behavior*:
559+
560+
.. code-block:: python
561+
562+
In [8]: s[pd.Interval(1, 5)]
563+
Out[8]:
564+
(0, 4] a
565+
(1, 5] b
566+
dtype: object
567+
568+
In [9]: s.loc[pd.Interval(1, 5)]
569+
Out[9]:
570+
(0, 4] a
571+
(1, 5] b
572+
dtype: object
573+
574+
*New behavior*:
575+
576+
.. ipython:: python
577+
578+
s[pd.Interval(1, 5)]
579+
s.loc[pd.Interval(1, 5)]
580+
581+
Similarly, a ``KeyError`` will be raised for non-exact matches instead of returning overlapping matches.
582+
583+
*Previous behavior*:
584+
585+
.. code-block:: python
586+
587+
In [9]: s[pd.Interval(2, 3)]
588+
Out[9]:
589+
(0, 4] a
590+
(1, 5] b
591+
dtype: object
592+
593+
In [10]: s.loc[pd.Interval(2, 3)]
594+
Out[10]:
595+
(0, 4] a
596+
(1, 5] b
597+
dtype: object
598+
599+
*New behavior*:
600+
601+
.. code-block:: python
602+
603+
In [6]: s[pd.Interval(2, 3)]
604+
---------------------------------------------------------------------------
605+
KeyError: Interval(2, 3, closed='right')
606+
607+
In [7]: s.loc[pd.Interval(2, 3)]
608+
---------------------------------------------------------------------------
609+
KeyError: Interval(2, 3, closed='right')
610+
611+
The :meth:`~IntervalIndex.overlaps` method can be used to create a boolean indexer that replicates the
612+
previous behavior of returning overlapping matches.
613+
614+
*New behavior*:
615+
616+
.. ipython:: python
617+
618+
idxr = s.index.overlaps(pd.Interval(2, 3))
619+
idxr
620+
s[idxr]
621+
s.loc[idxr]
622+
487623
.. _whatsnew_0250.api_breaking.deps:
488624
489625
Increased minimum versions for dependencies
@@ -686,7 +822,7 @@ Categorical
686822
687823
- Bug in :func:`DataFrame.at` and :func:`Series.at` that would raise exception if the index was a :class:`CategoricalIndex` (:issue:`20629`)
688824
- Fixed bug in comparison of ordered :class:`Categorical` that contained missing values with a scalar which sometimes incorrectly resulted in ``True`` (:issue:`26504`)
689-
-
825+
- Bug in :meth:`DataFrame.dropna` when the :class:`DataFrame` has a :class:`CategoricalIndex` containing :class:`Interval` objects incorrectly raised a ``TypeError`` (:issue:`25087`)
690826
691827
Datetimelike
692828
^^^^^^^^^^^^
@@ -764,6 +900,7 @@ Interval
764900
765901
- Construction of :class:`Interval` is restricted to numeric, :class:`Timestamp` and :class:`Timedelta` endpoints (:issue:`23013`)
766902
- Fixed bug in :class:`Series`/:class:`DataFrame` not displaying ``NaN`` in :class:`IntervalIndex` with missing values (:issue:`25984`)
903+
- Bug in :meth:`IntervalIndex.get_loc` where a ``KeyError`` would be incorrectly raised for a decreasing :class:`IntervalIndex` (:issue:`25860`)
767904
- Bug in :class:`Index` constructor where passing mixed closed :class:`Interval` objects would result in a ``ValueError`` instead of an ``object`` dtype ``Index`` (:issue:`27172`)
768905
769906
Indexing
@@ -778,6 +915,7 @@ Indexing
778915
- Fixed bug where assigning a :class:`arrays.PandasArray` to a :class:`pandas.core.frame.DataFrame` would raise error (:issue:`26390`)
779916
- Allow keyword arguments for callable local reference used in the :meth:`DataFrame.query` string (:issue:`26426`)
780917
- Bug which produced ``AttributeError`` on partial matching :class:`Timestamp` in a :class:`MultiIndex` (:issue:`26944`)
918+
- Bug in :class:`Categorical` and :class:`CategoricalIndex` with :class:`Interval` values when using the ``in`` operator (``__contains``) with objects that are not comparable to the values in the ``Interval`` (:issue:`23705`)
781919
- Bug in :meth:`DataFrame.loc` and :meth:`DataFrame.iloc` on a :class:`DataFrame` with a single timezone-aware datetime64[ns] column incorrectly returning a scalar instead of a :class:`Series` (:issue:`27110`)
782920
-
783921

pandas/core/arrays/interval.py

-5
Original file line numberDiff line numberDiff line change
@@ -41,11 +41,6 @@
4141
4242
.. versionadded:: %(versionadded)s
4343
44-
.. warning::
45-
46-
The indexing behaviors are provisional and may change in
47-
a future version of pandas.
48-
4944
Parameters
5045
----------
5146
data : array-like (1-dimensional)

pandas/core/indexes/base.py

+8-12
Original file line numberDiff line numberDiff line change
@@ -3235,8 +3235,9 @@ def reindex(self, target, method=None, level=None, limit=None,
32353235
if self.equals(target):
32363236
indexer = None
32373237
else:
3238-
3239-
if self.is_unique:
3238+
# check is_overlapping for IntervalIndex compat
3239+
if (self.is_unique and
3240+
not getattr(self, 'is_overlapping', False)):
32403241
indexer = self.get_indexer(target, method=method,
32413242
limit=limit,
32423243
tolerance=tolerance)
@@ -4481,8 +4482,7 @@ def argsort(self, *args, **kwargs):
44814482
result = np.array(self)
44824483
return result.argsort(*args, **kwargs)
44834484

4484-
def get_value(self, series, key):
4485-
"""
4485+
_index_shared_docs['get_value'] = """
44864486
Fast lookup of value from 1-dimensional ndarray. Only use this if you
44874487
know what you're doing.
44884488
@@ -4492,6 +4492,9 @@ def get_value(self, series, key):
44924492
A value in the Series with the index of the key value in self.
44934493
"""
44944494

4495+
@Appender(_index_shared_docs['get_value'] % _index_doc_kwargs)
4496+
def get_value(self, series, key):
4497+
44954498
# if we have something that is Index-like, then
44964499
# use this, e.g. DatetimeIndex
44974500
# Things like `Series._get_value` (via .at) pass the EA directly here.
@@ -4915,13 +4918,6 @@ def _searchsorted_monotonic(self, label, side='left'):
49154918

49164919
raise ValueError('index must be monotonic increasing or decreasing')
49174920

4918-
def _get_loc_only_exact_matches(self, key):
4919-
"""
4920-
This is overridden on subclasses (namely, IntervalIndex) to control
4921-
get_slice_bound.
4922-
"""
4923-
return self.get_loc(key)
4924-
49254921
def get_slice_bound(self, label, side, kind):
49264922
"""
49274923
Calculate slice bound that corresponds to given label.
@@ -4955,7 +4951,7 @@ def get_slice_bound(self, label, side, kind):
49554951

49564952
# we need to look up the label
49574953
try:
4958-
slc = self._get_loc_only_exact_matches(label)
4954+
slc = self.get_loc(label)
49594955
except KeyError as err:
49604956
try:
49614957
return self._searchsorted_monotonic(label, side)

0 commit comments

Comments
 (0)