-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Interval type should support intersection, union & overlaps & difference #21998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
xref #19480 This is reasonable, though some care is needed if these operations are to work between intervals with mixed For example, the following seems reasonable: In [2]: i0 = pd.Interval(0, 2, closed='both')
In [3]: i1 = pd.Interval(1, 3, closed='neither')
In [4]: i0.intersection(i1)
Out[4]: Interval(1, 2, closed='right') The only thing that immediately comes to mind as problematic is In [5]: i0.union(i1)
Out[5]: Interval(0, 3, closed='left') Non-overlapping intervals with the same In [6]: i2 = pd.Interval(8, 10, closed='both')
In [7]: i0.union(i2)
Out[7]:
IntervalArray([[0, 2], [8, 10]],
closed='both',
dtype='interval[int64]') The problematic case for In [8]: i3 = pd.Interval(8, 10, closed='neither')
In [9]: i0.union(i3)
Out[9]: array([Interval(0, 2, closed='both'), Interval(8, 10, closed='neither')], dtype=object) I'm not sure if an object dtype array actually provides an utility here, and seems a bit unnatural, so I'd lean towards raising. Could be convinced otherwise if anyone has a practical use for it though. |
Actually, I think In [2]: i0 = pd.Interval(0, 3, closed='both')
In [3]: i1 = pd.Interval(1, 2, closed='both')
In [4]: i0.difference(i1)
Out[4]: array([Interval(0, 1, closed='left'), Interval(2, 3, closed='right')], dtype=object) But with mixed In [5]: i2 = pd.Interval(1, 2, closed='neither')
In [6]: i0.difference(i2)
Out[6]:
IntervalArray([[0, 1], [2, 3]],
closed='both',
dtype='interval[int64]') So not really sure if |
@jschendel you are correct in the above observations. The complexity in my proposal derives from trying to return multiple intervals in either I looked into the behaviour of postgresql's range types, and perhaps we can model these operations similarly. Postgresql avoids the mixed boundary type problem by only returning a single value (boolean or single-continuous-interval) as result. I think the following range operators from postgresql taking two intervals as arguments should be interesting for pandas users:
and this function:
examples of how postgresql handles non trivial range operations: intersection:
difference:
union:
I think we could implement behaviour similar to casting functions, where the
|
Why do |
…ndas-dev#21998) Co-authored-by: Pedro Frigolet <[email protected]>
…v#21998) Co-authored-by: Pedro Frigolet <[email protected]>
…as-dev#21998) Co-authored-by: Pedro Frigolet <[email protected]>
…as-dev#21998) Co-authored-by: Pedro Frigolet <[email protected]>
…ndas-dev#21998) Co-authored-by: Pedro Frigolet <[email protected]>
…v#21998) Co-authored-by: Pedro Frigolet <[email protected]>
…as-dev#21998) Co-authored-by: Pedro Frigolet <[email protected]>
…ndas-dev#21998) Co-authored-by: Pedro Frigolet <[email protected]>
…v#21998) Co-authored-by: Pedro Frigolet <[email protected]>
…as-dev#21998) Co-authored-by: Pedro Frigolet <[email protected]>
…ndas-dev#21998) Co-authored-by: Pedro Frigolet <[email protected]>
…v#21998) Co-authored-by: Pedro Frigolet <[email protected]>
…as-dev#21998) Co-authored-by: Pedro Frigolet <[email protected]>
Problem description
We have the Interval type in pandas, which is extremely useful, however the standard interval arithmetic operations are missing from the pandas implementation. I would be happy to work on this enhancement.
One should be able to do the following with
pandas.Interval
The example uses numeric intervals, but the same operations are also valid for time series intervals.
The text was updated successfully, but these errors were encountered: