Skip to content

API: Implement new indexing behavior for intervals #27100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 2, 2019
4 changes: 2 additions & 2 deletions ci/code_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -245,10 +245,10 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Doctests interval classes' ; echo $MSG
pytest --doctest-modules -v \
pytest -q --doctest-modules \
pandas/core/indexes/interval.py \
pandas/core/arrays/interval.py \
-k"-from_arrays -from_breaks -from_intervals -from_tuples -get_loc -set_closed -to_tuples -interval_range"
-k"-from_arrays -from_breaks -from_intervals -from_tuples -set_closed -to_tuples -interval_range"
RET=$(($RET + $?)) ; echo $MSG "DONE"

fi
Expand Down
132 changes: 130 additions & 2 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -479,6 +479,133 @@ This change is backward compatible for direct usage of Pandas, but if you subcla
Pandas objects *and* give your subclasses specific ``__str__``/``__repr__`` methods,
you may have to adjust your ``__str__``/``__repr__`` methods (:issue:`26495`).

.. _whatsnew_0250.api_breaking.interval_indexing:


Indexing an ``IntervalIndex`` with ``Interval`` objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Indexing methods for :class:`IntervalIndex` have been modified to return exact matches only for :class:`Interval` queries.
``IntervalIndex`` methods previously matched on any overlapping ``Interval``. Behavior with scalar points, e.g. querying
with an integer, is unchanged (:issue:`16316`).

.. ipython:: python

ii = pd.IntervalIndex.from_tuples([(0, 4), (1, 5), (5, 8)])
ii

The ``in`` operator (``__contains__``) now only returns ``True`` for exact matches to ``Intervals`` in the ``IntervalIndex``, whereas
this would previously return ``True`` for any ``Interval`` overlapping an ``Interval`` in the ``IntervalIndex``.

*Previous behavior*:

.. code-block:: python

In [4]: pd.Interval(1, 2, closed='neither') in ii
Out[4]: True

In [5]: pd.Interval(-10, 10, closed='both') in ii
Out[5]: True

*New behavior*:

.. ipython:: python

pd.Interval(1, 2, closed='neither') in ii
pd.Interval(-10, 10, closed='both') in ii

The ``get_loc`` method now only returns locations for exact matches to ``Interval`` queries, as opposed to the previous behavior of
returning locations for overlapping matches. A ``KeyError`` will be raised if an exact match is not found.

*Previous behavior*:

.. code-block:: python

In [6]: ii.get_loc(pd.Interval(1, 5))
Out[6]: array([0, 1])

In [7]: ii.get_loc(pd.Interval(2, 6))
Out[7]: array([0, 1, 2])

*New behavior*:

.. code-block:: python

In [6]: ii.get_loc(pd.Interval(1, 5))
Out[6]: 1

In [7]: ii.get_loc(pd.Interval(2, 6))
---------------------------------------------------------------------------
KeyError: Interval(2, 6, closed='right')

Likewise, ``get_indexer`` and ``get_indexer_non_unique`` will also only return locations for exact matches to ``Interval`` queries, with
``-1`` denoting that an exact match was not found.

These indexing changes extend to querying a :class:`Series` or :class:`DataFrame` with an ``IntervalIndex`` index.

.. ipython:: python

s = pd.Series(list('abc'), index=ii)
s

Selecting from a ``Series`` or ``DataFrame`` using ``[]`` (``__getitem__``) or ``loc`` now only returns exact matches for ``Interval`` queries.

*Previous behavior*:

.. code-block:: python

In [8]: s[pd.Interval(1, 5)]
Out[8]:
(0, 4] a
(1, 5] b
dtype: object

In [9]: s.loc[pd.Interval(1, 5)]
Out[9]:
(0, 4] a
(1, 5] b
dtype: object

*New behavior*:

.. ipython:: python

s[pd.Interval(1, 5)]
s.loc[pd.Interval(1, 5)]

Similarly, non-exact matches will now raise a ``KeyError``.

*Previous behavior*:

.. code-block:: python

In [9]: s[pd.Interval(2, 6)]
Out[9]:
(0, 4] a
(1, 5] b
(5, 8] c
dtype: object

In [10]: s.loc[pd.Interval(2, 6)]
Out[10]:
(0, 4] a
(1, 5] b
(5, 8] c
dtype: object

*New behavior*:

.. code-block:: python

In [6]: s[pd.Interval(2, 6)]
---------------------------------------------------------------------------
KeyError: Interval(2, 6, closed='right')

In [7]: s.loc[pd.Interval(2, 6)]
---------------------------------------------------------------------------
KeyError: Interval(2, 6, closed='right')


.. _whatsnew_0250.api_breaking.deps:

Increased minimum versions for dependencies
Expand Down Expand Up @@ -652,7 +779,8 @@ Categorical

- Bug in :func:`DataFrame.at` and :func:`Series.at` that would raise exception if the index was a :class:`CategoricalIndex` (:issue:`20629`)
- Fixed bug in comparison of ordered :class:`Categorical` that contained missing values with a scalar which sometimes incorrectly resulted in ``True`` (:issue:`26504`)
-
- Bug in :meth:`DataFrame.dropna` when the :class:`DataFrame` has a :class:`CategoricalIndex` containing :class:`Interval` objects incorrectly raised a ``TypeError`` (:issue:`25087`)
- Bug in :class:`Categorical` and :class:`CategoricalIndex` with :class:`Interval` values when using the ``in`` operator (``__contains``) with objects that are not comparable to the values in the ``Interval`` (:issue:`23705`)

Datetimelike
^^^^^^^^^^^^
Expand Down Expand Up @@ -729,7 +857,7 @@ Interval

- Construction of :class:`Interval` is restricted to numeric, :class:`Timestamp` and :class:`Timedelta` endpoints (:issue:`23013`)
- Fixed bug in :class:`Series`/:class:`DataFrame` not displaying ``NaN`` in :class:`IntervalIndex` with missing values (:issue:`25984`)
-
- Bug in :meth:`IntervalIndex.get_loc` where a ``KeyError`` would be incorrectly raised for a decreasing :class:`IntervalIndex` (:issue:`25860`)

Indexing
^^^^^^^^
Expand Down
14 changes: 4 additions & 10 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3239,8 +3239,9 @@ def reindex(self, target, method=None, level=None, limit=None,
if self.equals(target):
indexer = None
else:

if self.is_unique:
# check is_overlapping for IntervalIndex compat
if (self.is_unique and
not getattr(self, 'is_overlapping', False)):
indexer = self.get_indexer(target, method=method,
limit=limit,
tolerance=tolerance)
Expand Down Expand Up @@ -4902,13 +4903,6 @@ def _searchsorted_monotonic(self, label, side='left'):

raise ValueError('index must be monotonic increasing or decreasing')

def _get_loc_only_exact_matches(self, key):
"""
This is overridden on subclasses (namely, IntervalIndex) to control
get_slice_bound.
"""
return self.get_loc(key)

def get_slice_bound(self, label, side, kind):
"""
Calculate slice bound that corresponds to given label.
Expand Down Expand Up @@ -4942,7 +4936,7 @@ def get_slice_bound(self, label, side, kind):

# we need to look up the label
try:
slc = self._get_loc_only_exact_matches(label)
slc = self.get_loc(label)
except KeyError as err:
try:
return self._searchsorted_monotonic(label, side)
Expand Down
Loading