Skip to content

ENH: Implement overlaps method for Interval-like #22939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Oct 24, 2018

Conversation

jschendel
Copy link
Member

@jschendel jschendel commented Oct 2, 2018

Implements Interval.overlaps, IntervalArray.overlaps, and IntervalIndex.overlaps.

Implementing this so there's an interval method that can be tested in the interval accessor PR (#19502), as there are currently only attributes that can be used by the accessor. Planning to reboot that PR soon after this is merged.

I didn't implement IntervalArray.overlaps(IntervalArray); the PR providing the specs for overlaps was recently closed due to being stale, and I don't recall there being a resolution on this behavior.

@jschendel jschendel added Enhancement Interval Interval data type labels Oct 2, 2018
@jschendel jschendel added this to the 0.24.0 milestone Oct 2, 2018
@pep8speaks
Copy link

pep8speaks commented Oct 2, 2018

Hello @jschendel! Thanks for updating the PR.

Comment last updated on October 02, 2018 at 03:16 Hours UTC

"""
Fixture for IntervalArray and IntervalIndex class constructors
"""
return request.param
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined the IntervalArray and IntervalIndex tests here, as the behavior for both should be identical. Not super happy about having the IntervalIndex tests here though, since this is in the arrays directory, but seemed better than creating dupe tests for IntervalIndex. Could maybe move these tests into a common location and combine with the Interval tests, but not exactly sure what would be best.

Copy link
Contributor

@jreback jreback Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u can make a common mixin class where the tests live that doesn’t start with test_ then use it in both places. we do this with the parser tests for example.

this pattern is going to become more common as we use more EA for Indexes

so happy to have this change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done (see my latest comment for additional details)

@codecov
Copy link

codecov bot commented Oct 2, 2018

Codecov Report

Merging #22939 into master will increase coverage by <.01%.
The diff coverage is 93.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22939      +/-   ##
==========================================
+ Coverage   92.19%   92.19%   +<.01%     
==========================================
  Files         169      169              
  Lines       50833    50966     +133     
==========================================
+ Hits        46863    46987     +124     
- Misses       3970     3979       +9
Flag Coverage Δ
#multiple 90.61% <93.33%> (ø) ⬆️
#single 42.27% <40%> (-0.1%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/interval.py 93.02% <100%> (+0.46%) ⬆️
pandas/core/indexes/interval.py 94.65% <80%> (+0.49%) ⬆️
pandas/plotting/_compat.py 87.5% <0%> (-3.81%) ⬇️
pandas/core/ops.py 94.63% <0%> (-2.74%) ⬇️
pandas/compat/numpy/function.py 86.66% <0%> (-1.82%) ⬇️
pandas/core/internals/concat.py 96.83% <0%> (-1.54%) ⬇️
pandas/core/dtypes/concat.py 97.69% <0%> (-1.49%) ⬇️
pandas/compat/numpy/__init__.py 92.85% <0%> (-1.09%) ⬇️
pandas/core/internals/blocks.py 92.91% <0%> (-0.95%) ⬇️
pandas/core/internals/managers.py 95.77% <0%> (-0.9%) ⬇️
... and 60 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ce06b5...a765e93. Read the comment docs.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't look in detail yet, some quick comments


Parameters
----------
other : Interval-like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Interval-like and not Interval ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Originally wrote Interval-like to mean Interval/IntervalIndex/IntervalArray, as there were originally plans to support all three. Probably best to just say Interval for now since that's the only thing supported, and leave such a modification for when the other two are implemented.

@@ -1015,6 +1017,40 @@ def repeat(self, repeats, **kwargs):
right_repeat = self.right.repeat(repeats, **kwargs)
return self._shallow_copy(left=left_repeat, right=right_repeat)

_interval_shared_docs['overlaps'] = """\
Check elementwise whether the given input overlaps the intervals in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First sentence should be a one-liner

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jschendel
Copy link
Member Author

Created a common mixin class for the tests:

  • Put the mixin class in tests/arrays/interval/interval_ops.py
    • Not sure that this is the best location; any better alternatives?
    • Kind of gross import in the indexes tests: from ...arrays.interval.interval_ops
  • Created test_interval_ops.py files in both the indexes and arrays subfolders that use the mixin.

@@ -0,0 +1,15 @@
"""Tests for Interval-Interval operations, such as overlaps, contains, etc."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_ops is ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,15 @@
"""Tests for Interval-Interval operations, such as overlaps, contains, etc."""
import pytest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_ops

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import pytest

from pandas import IntervalIndex
from ...arrays.interval.interval_ops import BaseOverlaps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use absolute imports. do we really need a mixin? (as opposed to fixtures)?

Copy link
Member Author

@jschendel jschendel Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted back to using a fixture; the downside of using a fixture to test both is that you end up with IntervalIndex tests in the array subfolder which is unintuitive. The mixin admittedly adds some code bloat, so not really an ideal solution either.

@@ -0,0 +1,61 @@
"""Tests for Interval-Interval operations, such as overlaps, contains, etc."""
import pytest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_ops

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"""
return request.param


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this needs a class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan is for this for all interval-interval specific methods to be tested here (e.g. covers, adjacent, etc.). Agreed that as-is this shouldn't need a class, but once additional methods are added it seems cleanest to organize each one in a separate class. Can get rid of the class if you'd still prefer that approach though.

@jreback
Copy link
Contributor

jreback commented Oct 18, 2018

lgtm. if you have any other comments @jorisvandenbossche or @TomAugspurger otherwise merge away.

@jschendel
Copy link
Member Author

ping @jorisvandenbossche @TomAugspurger : I'm planning to merge this within the next couple days so I can have a method to test out in the interval accessor PR. Just wanted to give both of you a quick heads up in case either of you wanted to review prior to this getting merged. The interval accessor PR has some merge conflicts that need to be resolved, so no immediate rush, but I'm planning to address the conflicts soon.

@@ -194,6 +194,7 @@ Other Enhancements
- :meth:`Index.to_frame` now supports overriding column name(s) (:issue:`22580`).
- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`).
- Compatibility with Matplotlib 3.0 (:issue:`22790`).
- Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think IntervalArray is in our API docs, so that link will cause a warning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not in the API docs at this time, but I imagine we'd add it if we decide to make IntervalArray (and other EA's) public? I have a few doc related TODOs surrounding whether or not IntervalArray gets made public, so I'll make an issue for these TODOs and include the above in the next day or so.

@Appender(_interval_shared_docs['overlaps'] % _shared_docs_kwargs)
def overlaps(self, other):
if isinstance(other, (IntervalArray, ABCIntervalIndex)):
raise NotImplementedError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we intend to eventually support IntervalArray.overlaps(IntervalArray), where the arrays are the same shape?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was discussion about this in #18975 but no consensus on the behavior. I'd be open to implementing it if there is demand. I could see adding a how parameter that accepts {'any', 'all', 'pairwise'} to determine the behavior (e.g. for 'any'/'all' the boolean indicates if the given interval overlaps any/all of the intervals in the supplied array, and 'pairwise' requiring the lengths to be the same and indicating if the nth element overlaps the nth element of the supplied array.

I don't have a use case for IntervalArray.overlaps(IntervalArray) though, so not entirely sure what portions of the above are practically useful, if any, or if some other behavior would be more commonly used. The 'pairwise' option would be straightforward to implement, with the other two maybe requiring some care to implement in a performant manner.

@jreback jreback merged commit 95aab41 into pandas-dev:master Oct 24, 2018
@jreback
Copy link
Contributor

jreback commented Oct 24, 2018

thanks @jschendel pls follow up as needed.

@jschendel jschendel deleted the enh-interval-overlaps branch October 25, 2018 02:07
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Interval Interval data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants