Skip to content

API: Implement new indexing behavior for intervals #27100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 2, 2019
4 changes: 2 additions & 2 deletions ci/code_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -245,10 +245,10 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Doctests interval classes' ; echo $MSG
pytest --doctest-modules -v \
pytest -q --doctest-modules \
pandas/core/indexes/interval.py \
pandas/core/arrays/interval.py \
-k"-from_arrays -from_breaks -from_intervals -from_tuples -get_loc -set_closed -to_tuples -interval_range"
-k"-from_arrays -from_breaks -from_intervals -from_tuples -set_closed -to_tuples -interval_range"
RET=$(($RET + $?)) ; echo $MSG "DONE"

fi
Expand Down
22 changes: 22 additions & 0 deletions doc/source/user_guide/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -965,6 +965,28 @@ If you select a label *contained* within an interval, this will also select the
df.loc[2.5]
df.loc[[2.5, 3.5]]

Selecting using an ``Interval`` will only return exact matches.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a versionchanged 0.25 here


.. ipython:: python

df.loc[pd.Interval(1, 2)]

Trying to select an ``Interval`` that is not exactly contained in the ``IntervalIndex`` will raise a ``KeyError``.

.. code-block:: python

In [7]: df.loc[pd.Interval(0.5, 2.5)]
---------------------------------------------------------------------------
KeyError: Interval(0.5, 2.5, closed='right')

Selecting all ``Intervals`` that overlap a given ``Interval`` can be performed using the
:meth:`~IntervalIndex.overlaps` method to create a boolean indexer.

.. ipython:: python

idxr = df.index.overlaps(pd.Interval(0.5, 2.5))
df[idxr]

``Interval`` and ``IntervalIndex`` are used by ``cut`` and ``qcut``:

.. ipython:: python
Expand Down
141 changes: 138 additions & 3 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -479,6 +479,141 @@ This change is backward compatible for direct usage of Pandas, but if you subcla
Pandas objects *and* give your subclasses specific ``__str__``/``__repr__`` methods,
you may have to adjust your ``__str__``/``__repr__`` methods (:issue:`26495`).

.. _whatsnew_0250.api_breaking.interval_indexing:


Indexing an ``IntervalIndex`` with ``Interval`` objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Indexing methods for :class:`IntervalIndex` have been modified to require exact matches only for :class:`Interval` queries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a link the docs you added above in indexing

``IntervalIndex`` methods previously matched on any overlapping ``Interval``. Behavior with scalar points, e.g. querying
with an integer, is unchanged (:issue:`16316`).

.. ipython:: python

ii = pd.IntervalIndex.from_tuples([(0, 4), (1, 5), (5, 8)])
ii

The ``in`` operator (``__contains__``) now only returns ``True`` for exact matches to ``Intervals`` in the ``IntervalIndex``, whereas
this would previously return ``True`` for any ``Interval`` overlapping an ``Interval`` in the ``IntervalIndex``.

*Previous behavior*:

.. code-block:: python

In [4]: pd.Interval(1, 2, closed='neither') in ii
Out[4]: True

In [5]: pd.Interval(-10, 10, closed='both') in ii
Out[5]: True

*New behavior*:

.. ipython:: python

pd.Interval(1, 2, closed='neither') in ii
pd.Interval(-10, 10, closed='both') in ii

The :meth:`~IntervalIndex.get_loc` method now only returns locations for exact matches to ``Interval`` queries, as opposed to the previous behavior of
returning locations for overlapping matches. A ``KeyError`` will be raised if an exact match is not found.

*Previous behavior*:

.. code-block:: python

In [6]: ii.get_loc(pd.Interval(1, 5))
Out[6]: array([0, 1])

In [7]: ii.get_loc(pd.Interval(2, 6))
Out[7]: array([0, 1, 2])

*New behavior*:

.. code-block:: python

In [6]: ii.get_loc(pd.Interval(1, 5))
Out[6]: 1

In [7]: ii.get_loc(pd.Interval(2, 6))
---------------------------------------------------------------------------
KeyError: Interval(2, 6, closed='right')

Likewise, :meth:`~IntervalIndex.get_indexer` and :meth:`~IntervalIndex.get_indexer_non_unique` will also only return locations for exact matches
to ``Interval`` queries, with ``-1`` denoting that an exact match was not found.

These indexing changes extend to querying a :class:`Series` or :class:`DataFrame` with an ``IntervalIndex`` index.

.. ipython:: python

s = pd.Series(list('abc'), index=ii)
s

Selecting from a ``Series`` or ``DataFrame`` using ``[]`` (``__getitem__``) or ``loc`` now only returns exact matches for ``Interval`` queries.

*Previous behavior*:

.. code-block:: python

In [8]: s[pd.Interval(1, 5)]
Out[8]:
(0, 4] a
(1, 5] b
dtype: object

In [9]: s.loc[pd.Interval(1, 5)]
Out[9]:
(0, 4] a
(1, 5] b
dtype: object

*New behavior*:

.. ipython:: python

s[pd.Interval(1, 5)]
s.loc[pd.Interval(1, 5)]

Similarly, a ``KeyError`` will be raised for non-exact matches instead of returning overlapping matches.

*Previous behavior*:

.. code-block:: python

In [9]: s[pd.Interval(2, 3)]
Out[9]:
(0, 4] a
(1, 5] b
dtype: object

In [10]: s.loc[pd.Interval(2, 3)]
Out[10]:
(0, 4] a
(1, 5] b
dtype: object

*New behavior*:

.. code-block:: python

In [6]: s[pd.Interval(2, 3)]
---------------------------------------------------------------------------
KeyError: Interval(2, 3, closed='right')

In [7]: s.loc[pd.Interval(2, 3)]
---------------------------------------------------------------------------
KeyError: Interval(2, 3, closed='right')

The :meth:`~IntervalIndex.overlaps` method can be used to create a boolean indexer that replicates the
previous behavior of returning overlapping matches.

*New behavior*:

.. ipython:: python

idxr = s.index.overlaps(pd.Interval(2, 3))
s[idxr]
s.loc[idxr]

.. _whatsnew_0250.api_breaking.deps:

Increased minimum versions for dependencies
Expand Down Expand Up @@ -652,7 +787,7 @@ Categorical

- Bug in :func:`DataFrame.at` and :func:`Series.at` that would raise exception if the index was a :class:`CategoricalIndex` (:issue:`20629`)
- Fixed bug in comparison of ordered :class:`Categorical` that contained missing values with a scalar which sometimes incorrectly resulted in ``True`` (:issue:`26504`)
-
- Bug in :meth:`DataFrame.dropna` when the :class:`DataFrame` has a :class:`CategoricalIndex` containing :class:`Interval` objects incorrectly raised a ``TypeError`` (:issue:`25087`)

Datetimelike
^^^^^^^^^^^^
Expand Down Expand Up @@ -729,7 +864,7 @@ Interval

- Construction of :class:`Interval` is restricted to numeric, :class:`Timestamp` and :class:`Timedelta` endpoints (:issue:`23013`)
- Fixed bug in :class:`Series`/:class:`DataFrame` not displaying ``NaN`` in :class:`IntervalIndex` with missing values (:issue:`25984`)
-
- Bug in :meth:`IntervalIndex.get_loc` where a ``KeyError`` would be incorrectly raised for a decreasing :class:`IntervalIndex` (:issue:`25860`)

Indexing
^^^^^^^^
Expand All @@ -742,7 +877,7 @@ Indexing
- Bug in which :meth:`DataFrame.to_csv` caused a segfault for a reindexed data frame, when the indices were single-level :class:`MultiIndex` (:issue:`26303`).
- Fixed bug where assigning a :class:`arrays.PandasArray` to a :class:`pandas.core.frame.DataFrame` would raise error (:issue:`26390`)
- Allow keyword arguments for callable local reference used in the :meth:`DataFrame.query` string (:issue:`26426`)

- Bug in :class:`Categorical` and :class:`CategoricalIndex` with :class:`Interval` values when using the ``in`` operator (``__contains``) with objects that are not comparable to the values in the ``Interval`` (:issue:`23705`)

Missing
^^^^^^^
Expand Down
20 changes: 8 additions & 12 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3239,8 +3239,9 @@ def reindex(self, target, method=None, level=None, limit=None,
if self.equals(target):
indexer = None
else:

if self.is_unique:
# check is_overlapping for IntervalIndex compat
if (self.is_unique and
not getattr(self, 'is_overlapping', False)):
indexer = self.get_indexer(target, method=method,
limit=limit,
tolerance=tolerance)
Expand Down Expand Up @@ -4468,8 +4469,7 @@ def argsort(self, *args, **kwargs):
result = np.array(self)
return result.argsort(*args, **kwargs)

def get_value(self, series, key):
"""
_index_shared_docs['get_value'] = """
Fast lookup of value from 1-dimensional ndarray. Only use this if you
know what you're doing.

Expand All @@ -4479,6 +4479,9 @@ def get_value(self, series, key):
A value in the Series with the index of the key value in self.
"""

@Appender(_index_shared_docs['get_value'] % _index_doc_kwargs)
def get_value(self, series, key):

# if we have something that is Index-like, then
# use this, e.g. DatetimeIndex
# Things like `Series._get_value` (via .at) pass the EA directly here.
Expand Down Expand Up @@ -4902,13 +4905,6 @@ def _searchsorted_monotonic(self, label, side='left'):

raise ValueError('index must be monotonic increasing or decreasing')

def _get_loc_only_exact_matches(self, key):
"""
This is overridden on subclasses (namely, IntervalIndex) to control
get_slice_bound.
"""
return self.get_loc(key)

def get_slice_bound(self, label, side, kind):
"""
Calculate slice bound that corresponds to given label.
Expand Down Expand Up @@ -4942,7 +4938,7 @@ def get_slice_bound(self, label, side, kind):

# we need to look up the label
try:
slc = self._get_loc_only_exact_matches(label)
slc = self.get_loc(label)
except KeyError as err:
try:
return self._searchsorted_monotonic(label, side)
Expand Down
Loading