-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Create Better IntervalDtype using PyArrow structs. #53033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jorisvandenbossche could the existing ArrowIntervalDtype be the solution here? |
@jbrockmendel is this something pandas wants to pursue? |
@jbrockmendel This issue is the only match when googling "ArrowIntervalDtype". |
@randolf-scholz I was having trouble finding that aswell arrow does have an interval type but only for time related which is not what this issue is on |
i was referring to pandas.core.arrays.arrow.ArrowIntervalType |
@jbrockmendel I took a look at the code, and one immediate limitation seems to be that again this restricts all intervals in the array to be of equal closedness, while the proposal here would allow storing intervals of different closedness in the same array. |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
Currently,
pandas.IntervalArray
suffer from 3 major limitations:They are limited to data with the same closedness on both sides.no longer the case apparentlyint32
.string
As a practical application for (1) that I am very interested in is storing information about the range of valid values for the columns of another
DataFrame
.Feature Description
Given the better integration with pyarrow since 2.0, we can recreate IntervalDtype using
pyarrow.struct
:Contrary to the current
IntervalDtype
, this would solve all 3 major problems at once:StructArray
can have separate closednessAlternative Solutions
None.
Additional Context
Additionally, common request is adding extra operations for interval dtypes:
Additionally, one could imagine having a
IntervalUnion
type, that can represent finite unions of intervals, combining the interval type discussed here with pyarrow list-type. This type would naturally arise when performing unions of intervals, such as [0, 2]∪[3, 5]. The nice thing here is that the resulting space is mathematically closed under the standard set operations (union, intersection, complements, difference)The text was updated successfully, but these errors were encountered: