-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Arithmetic operations on intervals #43629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@Hoeze I developed a package called staircase which allows a lot of manipulations with intervals and is designed to be closely aligned with pandas. You may find it handy? www.staircase.dev |
Is there a standard or convention for arithmetic with interval types? For example, do Postgres range types operate similarly? |
@venaturum Thanks for your suggestion. @mroeschke The details about Postgresql's range operators can be found here:
Further operators are any kinds of overlap or contain operators. I would not know whether PostgreSQL supports range aggregations, but I think that |
@Hoeze no problem. In the code snippets below I am assuming a notebook format where the last variable is printed to console. A jupyter notebook with these code examples is downloadable here: IntervalsWithStaircase.zip The solution involves thinking about the problem in terms of step functions. We'll use a one-to-one mapping between sets of disjoint intervals and step functions, where for a set of intervals, the corresponding step function is The package represents step functions via a class called Stairs. This class has almost every binary operator you can think of defined, all results of which are Stairs objects themselves. import pandas as pd
import staircase as sc
a = sc.Stairs(start=3, end=6)
b = sc.Stairs(start=5, end=7) There are more parameters to the Stairs constructor which dictate "step-size" (default 1) and baseline-value (default 0) but You can use Stairs.plot, or Stairs.to_frame to understand the step function values b.to_frame()
The union of intervals (a+b in your example) can be done by using a "logical or" operation: result = a | b # result will be a Stairs instance
result.to_frame()
For set difference (a-b in your example) the phrase "in a, and not in b" can be literally translated with the step functions: result = a & ~b
result.to_frame()
The same result can be achieved by using result = a.mask(b).fillna(0) or result = a * ~b For the union of many intervals, there are two approaches. The first is to create a step function for every interval and repeatedly apply the logical_or function. from functools import reduce
intervals = pd.arrays.IntervalArray.from_tuples([(0, 1), (1, 3), (2, 4), (5, 7)])
result = reduce(sc.Stairs.logical_or, [sc.Stairs(start=i.left, end=i.right) for i in intervals])
result.to_frame()
The second approach is to add the step functions together (and values where the intervals overlap will have values of 2 or more), and then set non-zero values of the step function to 1. This approach may be favourable as it uses the ability to pass in vectors of start and end values for the intervals into the constructor of a single Stairs object. result1 = sc.Stairs(start=intervals.left, end=intervals.right)
result1.to_frame()
This is where From here we can just use a relational operator to get a boolean valued step function: result2 = result1 > 0
result2.to_frame()
We could have also used If you want to convert this step function back to an interval array then it can be done so like this: intervals2 = (
result2.to_frame()
.query('value == 1')
.pipe(lambda df: pd.arrays.IntervalArray.from_arrays(df.start, df.end))
)
print(intervals2)
Is interval manipulation the primary use case for staircase? Probably not. Given vectors of "start" and "stop" times it makes quick work of things like queues, and asset utilisations. But maybe it's useful here depending on the context of your problem. |
@Hoeze I've taken the above concepts with staircase and created a new package "piso" for set operations over It's currently available through pypi. Will continue to expand the functionality over the coming weeks. |
I made a commit that fixes this issue: https://github.com/clarkwiththew/pandas/commit/6471a095bb77a0a558a6bf7dcd80f7948354f9ab |
Is your feature request related to a problem?
I would like to be able to do arithmetic operations on intervals:
a + b
:pd.Interval(3, 6) + pd.Interval(5, 7) == pd.Interval(3, 7)
a - b
:pd.Interval(3, 6) - pd.Interval(5, 7) == pd.Interval(3, 5)
[a, b].union() / np.sum([a, b])
:API breaking implications
None AFAIK, since those methods are not implemented yet
The text was updated successfully, but these errors were encountered: