You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For clarification, by "interval-point" joins I mean joining an IntervalIndex/IntervalArray against the point values contained in the intervals, e.g. joining a numeric IntervalIndex against a Float64Index. I want to keep this discussion separate from interval-interval merges for the time being.
For example, the following join does not currently work (and likewise merge with column data):
In [2]: df1=pd.DataFrame({'A': [10, 20, 30]}, index=pd.interval_range(0, 3))
In [3]: df2=pd.DataFrame({'B': ['foo', 'bar', 'baz', 'qux']},
...: index=[0.5, 1, 2.71828, 3.14159])
In [4]: df1Out[4]:
A
(0, 1] 10
(1, 2] 20
(2, 3] 30In [5]: df2Out[5]:
B0.50000foo1.00000bar2.71828baz3.14159quxIn [6]: df1.join(df2)
Out[6]:
AB
(0, 1] 10NaN
(1, 2] 20NaN
(2, 3] 30NaN
I think the behavior of such a join/merge is straight forward for left/right joins, but is a little bit less clear for inner/outer joins. For inner (outer) joins one takes the intersection (union) of both indexes as the resulting index values. This makes sense when both indexes contain the same type of objects, but this is not the case for interval-point joins. I can't think of a consistent way to handle inner/outer joins, and not entirely if they even make sense. A few options:
Do not support inner/outer interval-point joins
For inner joins:
always keep the intervals and filter any non-matches?
default to the left index and filter any non-matches?
For outer joins:
union any non-matching point values for an object dtype?
obviously non-performant and a bit weird
coerce non-matching points to degenerate intervals (left == right) and union for an interval dtype?
Use a new API for non-exact interval joins?
I'm leaning towards just using the existing API not supporting inner/outer for the time being, but would appreciate any thoughts.
The text was updated successfully, but these errors were encountered:
Is this going to be included any time soon?
I think it is quite useful to be able to merge 2 dataframes using this concept of "intersection", i.e, if an point that belongs to one of the Dfs intersects with the interval of the other this is considered as a match. It is quite similar to the concept of sapatial joins that is present in GeoPandas. Looking at it this way I do not see why inner/outer joins can be messy.
For clarification, by "interval-point" joins I mean joining an
IntervalIndex
/IntervalArray
against the point values contained in the intervals, e.g. joining a numericIntervalIndex
against aFloat64Index
. I want to keep this discussion separate from interval-interval merges for the time being.For example, the following
join
does not currently work (and likewisemerge
with column data):I think the behavior of such a
join
/merge
is straight forward for left/right joins, but is a little bit less clear for inner/outer joins. For inner (outer) joins one takes the intersection (union) of both indexes as the resulting index values. This makes sense when both indexes contain the same type of objects, but this is not the case for interval-point joins. I can't think of a consistent way to handle inner/outer joins, and not entirely if they even make sense. A few options:object
dtype?I'm leaning towards just using the existing API not supporting inner/outer for the time being, but would appreciate any thoughts.
The text was updated successfully, but these errors were encountered: