Skip to content

ENH: Implement overlaps method for Interval-like #22939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Oct 24, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1651,6 +1651,7 @@ IntervalIndex Components
IntervalIndex.get_loc
IntervalIndex.get_indexer
IntervalIndex.set_closed
IntervalIndex.overlaps


.. _api.multiindex:
Expand Down Expand Up @@ -2037,6 +2038,7 @@ Properties
Interval.mid
Interval.open_left
Interval.open_right
Interval.overlaps
Interval.right

Timedelta
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@ Other Enhancements
- :meth:`Index.to_frame` now supports overriding column name(s) (:issue:`22580`).
- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`).
- Compatibility with Matplotlib 3.0 (:issue:`22790`).
- Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think IntervalArray is in our API docs, so that link will cause a warning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not in the API docs at this time, but I imagine we'd add it if we decide to make IntervalArray (and other EA's) public? I have a few doc related TODOs surrounding whether or not IntervalArray gets made public, so I'll make an issue for these TODOs and include the above in the next day or so.


.. _whatsnew_0240.api_breaking:

Expand Down
30 changes: 30 additions & 0 deletions pandas/_libs/interval.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ from cython cimport Py_ssize_t
import numpy as np
from numpy cimport ndarray

from operator import le, lt

cimport util
util.import_array()
Expand Down Expand Up @@ -359,6 +360,35 @@ cdef class Interval(IntervalMixin):
self.left // y, self.right // y, closed=self.closed)
return NotImplemented

def overlaps(self, other):
"""
Check whether two interval objects overlap.

.. versionadded:: 0.24.0

Parameters
----------
other : Interval
The interval to check against for an overlap.

Returns
-------
bool
``True`` if the two intervals overlap, else ``False``.
"""
if not isinstance(other, Interval):
msg = '`other` must be an Interval, got {other}'
raise TypeError(msg.format(other=type(other).__name__))

# equality is okay if both endpoints are closed (overlap at a point)
op1 = le if (self.closed_left and other.closed_right) else lt
op2 = le if (other.closed_left and self.closed_right) else lt

# overlaps is equivalent negation of two interval being disjoint:
# disjoint = (A.left > B.right) or (B.left > A.right)
# (simplifying the negation allows this to be done in less operations)
return op1(self.left, other.right) and op2(other.left, self.right)


@cython.wraparound(False)
@cython.boundscheck(False)
Expand Down
8 changes: 8 additions & 0 deletions pandas/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,14 @@ def closed(request):
return request.param


@pytest.fixture(params=['left', 'right', 'both', 'neither'])
def other_closed(request):
"""
Secondary closed fixture to allow parametrizing over all pairs of closed
"""
return request.param


@pytest.fixture(params=[None, np.nan, pd.NaT, float('nan'), np.float('NaN')])
def nulls_fixture(request):
"""
Expand Down
36 changes: 36 additions & 0 deletions pandas/core/arrays/interval.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import textwrap
import numpy as np

from operator import le, lt

from pandas._libs.interval import (Interval, IntervalMixin,
intervals_to_interval_bounds)
from pandas.compat import add_metaclass
Expand Down Expand Up @@ -1015,6 +1017,40 @@ def repeat(self, repeats, **kwargs):
right_repeat = self.right.repeat(repeats, **kwargs)
return self._shallow_copy(left=left_repeat, right=right_repeat)

_interval_shared_docs['overlaps'] = """\
Check elementwise whether the given input overlaps the intervals in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First sentence should be a one-liner

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

the %(klass)s.

.. versionadded:: 0.24.0

Parameters
----------
other : Interval-like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Interval-like and not Interval ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Originally wrote Interval-like to mean Interval/IntervalIndex/IntervalArray, as there were originally plans to support all three. Probably best to just say Interval for now since that's the only thing supported, and leave such a modification for when the other two are implemented.

Interval-like object to check against for an overlap.

Returns
-------
ndarray
Boolean array positionally indicating where an overlap occurs.
"""

@Appender(_interval_shared_docs['overlaps'] % _shared_docs_kwargs)
def overlaps(self, other):
if isinstance(other, (IntervalArray, ABCIntervalIndex)):
raise NotImplementedError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we intend to eventually support IntervalArray.overlaps(IntervalArray), where the arrays are the same shape?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was discussion about this in #18975 but no consensus on the behavior. I'd be open to implementing it if there is demand. I could see adding a how parameter that accepts {'any', 'all', 'pairwise'} to determine the behavior (e.g. for 'any'/'all' the boolean indicates if the given interval overlaps any/all of the intervals in the supplied array, and 'pairwise' requiring the lengths to be the same and indicating if the nth element overlaps the nth element of the supplied array.

I don't have a use case for IntervalArray.overlaps(IntervalArray) though, so not entirely sure what portions of the above are practically useful, if any, or if some other behavior would be more commonly used. The 'pairwise' option would be straightforward to implement, with the other two maybe requiring some care to implement in a performant manner.

elif not isinstance(other, Interval):
msg = '`other` must be Interval-like, got {other}'
raise TypeError(msg.format(other=type(other).__name__))

# equality is okay if both endpoints are closed (overlap at a point)
op1 = le if (self.closed_left and other.closed_right) else lt
op2 = le if (other.closed_left and self.closed_right) else lt

# overlaps is equivalent negation of two interval being disjoint:
# disjoint = (A.left > B.right) or (B.left > A.right)
# (simplifying the negation allows this to be done in less operations)
return op1(self.left, other.right) & op2(other.left, self.right)


def maybe_convert_platform_interval(values):
"""
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/indexes/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -982,6 +982,10 @@ def equals(self, other):
self.right.equals(other.right) and
self.closed == other.closed)

@Appender(_interval_shared_docs['overlaps'] % _index_doc_kwargs)
def overlaps(self, other):
return self._data.overlaps(other)

def _setop(op_name):
def func(self, other):
other = self._as_like_interval_index(other)
Expand Down
82 changes: 82 additions & 0 deletions pandas/tests/arrays/interval/test_interval_ops.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
"""Tests for Interval-Interval operations, such as overlaps, contains, etc."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_ops is ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import numpy as np
import pytest

import pandas.util.testing as tm
from pandas import Interval, IntervalIndex, Timedelta, Timestamp
from pandas.core.arrays import IntervalArray


@pytest.fixture(params=[IntervalArray, IntervalIndex])
def constructor(request):
"""
Fixture for IntervalArray and IntervalIndex class constructors
"""
return request.param
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined the IntervalArray and IntervalIndex tests here, as the behavior for both should be identical. Not super happy about having the IntervalIndex tests here though, since this is in the arrays directory, but seemed better than creating dupe tests for IntervalIndex. Could maybe move these tests into a common location and combine with the Interval tests, but not exactly sure what would be best.

Copy link
Contributor

@jreback jreback Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u can make a common mixin class where the tests live that doesn’t start with test_ then use it in both places. we do this with the parser tests for example.

this pattern is going to become more common as we use more EA for Indexes

so happy to have this change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done (see my latest comment for additional details)



@pytest.fixture(params=[
(Timedelta('0 days'), Timedelta('1 day')),
(Timestamp('2018-01-01'), Timedelta('1 day')),
(0, 1)], ids=lambda x: type(x[0]).__name__)
def start_shift(request):
"""
Fixture for generating intervals of types from a start value and a shift
value that can be added to start to generate an endpoint
"""
return request.param


class TestOverlaps(object):

def test_overlaps_interval(
self, constructor, start_shift, closed, other_closed):
start, shift = start_shift
interval = Interval(start, start + 3 * shift, other_closed)

# container: identical, nested, spanning, partial, adjacent, disjoint
tuples = [(start, start + 3 * shift),
(start + shift, start + 2 * shift),
(start - shift, start + 4 * shift),
(start + 2 * shift, start + 4 * shift),
(start + 3 * shift, start + 4 * shift),
(start + 4 * shift, start + 5 * shift)]
interval_container = constructor.from_tuples(tuples, closed)

adjacent = (interval.closed_right and interval_container.closed_left)
expected = np.array([True, True, True, True, adjacent, False])
result = interval_container.overlaps(interval)
tm.assert_numpy_array_equal(result, expected)

@pytest.mark.parametrize('other_constructor', [
IntervalArray, IntervalIndex])
def test_overlaps_interval_container(self, constructor, other_constructor):
# TODO: modify this test when implemented
interval_container = constructor.from_breaks(range(5))
other_container = other_constructor.from_breaks(range(5))
with pytest.raises(NotImplementedError):
interval_container.overlaps(other_container)

def test_overlaps_na(self, constructor, start_shift):
"""NA values are marked as False"""
start, shift = start_shift
interval = Interval(start, start + shift)

tuples = [(start, start + shift),
np.nan,
(start + 2 * shift, start + 3 * shift)]
interval_container = constructor.from_tuples(tuples)

expected = np.array([True, False, False])
result = interval_container.overlaps(interval)
tm.assert_numpy_array_equal(result, expected)

@pytest.mark.parametrize('other', [
10, True, 'foo', Timedelta('1 day'), Timestamp('2018-01-01')],
ids=lambda x: type(x).__name__)
def test_overlaps_invalid_type(self, constructor, other):
interval_container = constructor.from_breaks(range(5))
msg = '`other` must be Interval-like, got {other}'.format(
other=type(other).__name__)
with tm.assert_raises_regex(TypeError, msg):
interval_container.overlaps(other)
61 changes: 61 additions & 0 deletions pandas/tests/scalar/interval/test_interval_ops.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
"""Tests for Interval-Interval operations, such as overlaps, contains, etc."""
import pytest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_ops

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import pandas.util.testing as tm
from pandas import Interval, Timedelta, Timestamp


@pytest.fixture(params=[
(Timedelta('0 days'), Timedelta('1 day')),
(Timestamp('2018-01-01'), Timedelta('1 day')),
(0, 1)], ids=lambda x: type(x[0]).__name__)
def start_shift(request):
"""
Fixture for generating intervals of types from a start value and a shift
value that can be added to start to generate an endpoint
"""
return request.param


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this needs a class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan is for this for all interval-interval specific methods to be tested here (e.g. covers, adjacent, etc.). Agreed that as-is this shouldn't need a class, but once additional methods are added it seems cleanest to organize each one in a separate class. Can get rid of the class if you'd still prefer that approach though.

class TestOverlaps(object):

def test_overlaps_self(self, start_shift, closed):
start, shift = start_shift
interval = Interval(start, start + shift, closed)
assert interval.overlaps(interval)

def test_overlaps_nested(self, start_shift, closed, other_closed):
start, shift = start_shift
interval1 = Interval(start, start + 3 * shift, other_closed)
interval2 = Interval(start + shift, start + 2 * shift, closed)

# nested intervals should always overlap
assert interval1.overlaps(interval2)

def test_overlaps_disjoint(self, start_shift, closed, other_closed):
start, shift = start_shift
interval1 = Interval(start, start + shift, other_closed)
interval2 = Interval(start + 2 * shift, start + 3 * shift, closed)

# disjoint intervals should never overlap
assert not interval1.overlaps(interval2)

def test_overlaps_endpoint(self, start_shift, closed, other_closed):
start, shift = start_shift
interval1 = Interval(start, start + shift, other_closed)
interval2 = Interval(start + shift, start + 2 * shift, closed)

# overlap if shared endpoint is closed for both (overlap at a point)
result = interval1.overlaps(interval2)
expected = interval1.closed_right and interval2.closed_left
assert result == expected

@pytest.mark.parametrize('other', [
10, True, 'foo', Timedelta('1 day'), Timestamp('2018-01-01')],
ids=lambda x: type(x).__name__)
def test_overlaps_invalid_type(self, other):
interval = Interval(0, 1)
msg = '`other` must be an Interval, got {other}'.format(
other=type(other).__name__)
with tm.assert_raises_regex(TypeError, msg):
interval.overlaps(other)