-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
vectorized operations with pd.Series of pd.Interval data #25177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes; already noted in #16401. No explicit timeline on this feature but PRs always welcome! |
Didn't mean to open a duplicate issue...thanks! |
To expand on @mroeschke' comment a bit: A lot of this functionality was just added in 0.24.0 but is not documented well beyond the API reference, which should certainly be improved (xref #16400).
This can be done via the In [1]: import pandas as pd; pd.__version__
Out[1]: '0.24.0'
In [2]: df = pd.DataFrame({'start': [0, 1, 4], 'end': [2, 3, 8]}); df
Out[2]:
start end
0 0 2
1 1 3
2 4 8
In [3]: df['range'] = pd.arrays.IntervalArray.from_arrays(df['start'], df['end'])
In [4]: df
Out[4]:
start end range
0 0 2 (0, 2]
1 1 3 (1, 3]
2 4 8 (4, 8]
There is an open issue to add this (xref #16401). This can be more or less done currently via the In [5]: df['range'].array.length
Out[5]: Int64Index([2, 2, 4], dtype='int64', name='end')
In [6]: df['range'].array.mid
Out[6]: Float64Index([1.0, 2.0, 6.0], dtype='float64', name='start')
In [7]: df['range'].array.overlaps(pd.Interval(2.5, 5))
Out[7]: array([False, True, True]) The main difference between a full interval accessor and the There are still quite a few things that need work, and many open issue. For example, arithmetic operations don't work for In [8]: df['range'] + 1
---------------------------------------------------------------------------
TypeError: unsupported operand type(s) for +: 'IntervalArray' and 'int'
In [9]: pd.Index(df['range']) + 1
Out[9]:
IntervalIndex([(1, 3], (2, 4], (5, 9]],
closed='right',
dtype='interval[int64]') And also some suggested features (xref #19480, #21998) along with new specs for indexing behavior with intervals (xref #16316). PRs are welcome to address any of the shortcomings mentioned here! |
Is there a plan to allow vectorized operations on pd.Series of pd.Interval data in the future -- perhaps just for the syntactic sugar?
I could imagine an interface similar to the
pd.Series.str
. For example, operations might look like:The text was updated successfully, but these errors were encountered: