Skip to content

API: Implement interval-point joins #21901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jschendel opened this issue Jul 13, 2018 · 2 comments
Open

API: Implement interval-point joins #21901

jschendel opened this issue Jul 13, 2018 · 2 comments
Labels
Enhancement Interval Interval data type Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@jschendel
Copy link
Member

For clarification, by "interval-point" joins I mean joining an IntervalIndex/IntervalArray against the point values contained in the intervals, e.g. joining a numeric IntervalIndex against a Float64Index. I want to keep this discussion separate from interval-interval merges for the time being.

For example, the following join does not currently work (and likewise merge with column data):

In [2]: df1 = pd.DataFrame({'A': [10, 20, 30]}, index=pd.interval_range(0, 3))

In [3]: df2 = pd.DataFrame({'B': ['foo', 'bar', 'baz', 'qux']},
   ...:                    index=[0.5, 1, 2.71828, 3.14159])

In [4]: df1
Out[4]: 
         A
(0, 1]  10
(1, 2]  20
(2, 3]  30

In [5]: df2
Out[5]: 
           B
0.50000  foo
1.00000  bar
2.71828  baz
3.14159  qux

In [6]: df1.join(df2)
Out[6]: 
         A    B
(0, 1]  10  NaN
(1, 2]  20  NaN
(2, 3]  30  NaN

I think the behavior of such a join/merge is straight forward for left/right joins, but is a little bit less clear for inner/outer joins. For inner (outer) joins one takes the intersection (union) of both indexes as the resulting index values. This makes sense when both indexes contain the same type of objects, but this is not the case for interval-point joins. I can't think of a consistent way to handle inner/outer joins, and not entirely if they even make sense. A few options:

  • Do not support inner/outer interval-point joins
  • For inner joins:
    • always keep the intervals and filter any non-matches?
    • default to the left index and filter any non-matches?
  • For outer joins:
    • union any non-matching point values for an object dtype?
      • obviously non-performant and a bit weird
    • coerce non-matching points to degenerate intervals (left == right) and union for an interval dtype?
  • Use a new API for non-exact interval joins?

I'm leaning towards just using the existing API not supporting inner/outer for the time being, but would appreciate any thoughts.

@jschendel jschendel added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design Interval Interval data type Needs Discussion Requires discussion from core team before further action labels Jul 13, 2018
@jschendel jschendel added this to the Contributions Welcome milestone Jul 13, 2018
@gpolov-personal
Copy link

Is this going to be included any time soon?
I think it is quite useful to be able to merge 2 dataframes using this concept of "intersection", i.e, if an point that belongs to one of the Dfs intersects with the interval of the other this is considered as a match. It is quite similar to the concept of sapatial joins that is present in GeoPandas. Looking at it this way I do not see why inner/outer joins can be messy.

@jreback
Copy link
Contributor

jreback commented Apr 13, 2021

@GPoloVera you are welcome to submit a PR for things

@mroeschke mroeschke removed API Design Needs Discussion Requires discussion from core team before further action labels Jun 20, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Interval Interval data type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants