-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
interval.overlaps() mishandles empty intervals #26893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@pilkibun Thanks for the report
can you give examples? |
cc @jschendel if you have thoughts on the expected behavior here. Should we disallow things like |
In [4]:
...: import pandas as pd
...: from pandas import Interval
...:
...: a=Interval(0,1,'both')
...: b=Interval(0,0,'right')
...: print(a.overlaps(b))
...: b=Interval(0,0,'left')
...: print(a.overlaps(b))
True
False Since the left side and right side of a point interval coincide, it cannot have one side open and side closed at the same time. This is a pathological corner case but programmatically you may end up creating such intervals, With the PR change pandas will complain loudly to warn the user, and force them to handle the case explicitly as either an empty interval |
I'm -1 on this, as it would be inconsistent with other interval implementations I've seen, which allow empty intervals. From what I can tell, other interval implementations tend to fall into two categories:
Disallowing Some examples of other interval implementations:
The above does raise a question as to whether or not we want to specifically add a custom Adding
Yes, IMO the downstream logic should handle empty intervals.
Not really, from a mathematical perspective these definitions are valid and equivalent to |
Thanks for the writeup @jschendel. FWIW (not much), my intutiton matches yours here. |
Also noting #26894 (review) here, which belongs as part of the discussion here. |
Interval
handling of left=right intervals
I also just noticed that for postgres
That makes me even more skeptical about the logic of having empty intervals included in an index, |
Yes, of course, but this is a bug in the In [1]: import pandas as pd; pd.__version__
Out[1]: '0.25.0.dev0+750.gbaa77c33f'
In [2]: a = pd.Interval(-1, 1, closed='both')
In [3]: b = pd.Interval(0, 0, closed='neither')
In [4]: a.overlaps(b)
Out[4]: True
I never said that our implementation was consistent with postgres, or any other implementation. Ideally we'd conform to the common conventions, but intervals are still relatively new in pandas - there are still many features that need to be added, and corner cases that need to be addressed (as you've discovered here). My point was disallowing To be clear, I'm NOT against changing the behavior of how pandas handles I don't think silently converting these to
Indexes in pandas generically support dupes, so this is something that will be accounted for regardless.
Likewise indexes in pandas generically support missing values, which similarly botch ordering, and will be accounted for regardless. I'm guessing that empty intervals can handled in a similar manner. I suppose this could be an argument for maintaining the existing behavior, as the current implementation does not botch ordering.
This can come about via the contraction of intervals, i.e. restricting intervals so that they lie within a given observation interval, and then using the contracted intervals to scale things where you'd want empty intervals to count as zero. Admittedly, the machinery to do this idiomatically does not exist in pandas yet, but the methods to do so are things that have been discussed. This doesn't necessitate using an |
Interval
handling of left=right intervals
The special case where
left=right
is not checked properly by the constructor, onlyclosed='both'
(a point) orclosed='neither'
(an empty interval) makes sense but '[0,0)' or(0,0]
are nonsensical. Theoverlaps
method returns inconsistent results when encountering such Intervals.Update:
Examples (from below):
The text was updated successfully, but these errors were encountered: