Skip to content

Add "forward" and "nearest" direction to merge_asof() #14887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chrisaycock opened this issue Dec 15, 2016 · 4 comments
Closed

Add "forward" and "nearest" direction to merge_asof() #14887

chrisaycock opened this issue Dec 15, 2016 · 4 comments
Labels
Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@chrisaycock
Copy link
Contributor

chrisaycock commented Dec 15, 2016

Currently pd.merge_asof() looks backwards, meaning that it gets the last row from the right table whose timestamp is less the left table's timestamp. It might be nice to look forwards, meaning get the first row in the right table whose timestamps is greater than the left table's timestamp. And we could have a nearest, meaning get the row from the right table whose timestamp is closest to the left table's timestamp regardless of direction.

I propose a new direction parameter whose default value is "backward", which looks for prior timestamps as pd.merge_asof() currently does. A value of "forward" will cause cause the function to look for subsequent timestamps. A value of "nearest" will look on both directions.

Here's a modified example from the docstring:

In [16]: left
Out[16]:
    a left_val
0   1        a
1   5        b
2  10        c

In [17]: right
Out[17]:
   a  right_val
0  1          1
1  2          2
2  3          3
3  6          6
4  7          7

In [18]: pd.merge_asof(left, right, on='a')
Out[18]:
    a left_val  right_val
0   1        a          1
1   5        b          3
2  10        c          7

In [19]: pd.merge_asof(left, right, on='a', direction='forward')
Out[19]:
    a left_val  right_val
0   1        a        1.0
1   5        b        6.0
2  10        c        NaN

In [20]: pd.merge_asof(left, right, on='a', direction='nearest')
Out[20]:
    a left_val  right_val
0   1        a          1
1   5        b          6
2  10        c          7
@jreback
Copy link
Contributor

jreback commented Dec 15, 2016

this is fine.

also prob should add an example / expl in docs about how .merge_asof differs from .reindex(...., method='nearest', tolerance=...), which is effectively asof reindexing in both directions (and min of those).

so should have:

direction='backward'|'forward'|'both' (this is how we spell it for example on Series.interpolate for limit_direction), or maybe both -> nearest

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels Dec 15, 2016
@jreback jreback added this to the Next Major Release milestone Dec 15, 2016
@chrisaycock
Copy link
Contributor Author

I like the idea of a nearest direction that minimizes the absolute distance between left and right timestamps. Might need a tie-break parameter for first or last entry though, like the side parameter in np.searchsorted().

As for the difference between pd.merge_asof() and .reindex(), the former is a join between two DataFrames. I suppose someone could reindex the right DataFrame using the left's index, and then concatenate the resulting table. But that's a lot of work, doesn't handle the by parameter for exact matches (pd.merge_asof() is a single scan), and requires the timestamp values to be indexes rather than ordinary columns.

@chrisaycock chrisaycock changed the title Add a "forward" direction to merge_asof() Add "forward" and "nearest" direction to merge_asof() Jan 12, 2017
@jreback jreback modified the milestones: 0.20.0, Next Major Release Jan 13, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
closes pandas-dev#14887

Author: Christopher C. Aycock <[email protected]>

Closes pandas-dev#15129 from chrisaycock/GH14887 and squashes the following commits:

da38483 [Christopher C. Aycock] Description of direction parameters is now a bullet-point list
879c9f0 [Christopher C. Aycock] Tweak prose to include statement on version and parameters
ce5caaa [Christopher C. Aycock] Split Cython functions according to direction
50431ad [Christopher C. Aycock] ENH: Added 'direction' parameter to merge_asof() (pandas-dev#14887)
@lauhayden
Copy link

May I ask what happened to this PR? How come it was never merged, even though it was added to 0.20.0 milestone at one point?

@jreback
Copy link
Contributor

jreback commented Oct 6, 2017

this was merged quite a while ago
it’s in 0.20.1 IIIRC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants