-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Added 'direction' parameter to merge_asof() (#14887) #15129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -33,13 +33,15 @@ def asof_join_{{on_dtype}}_by_{{by_dtype}}(ndarray[{{on_dtype}}] left_values, | |||
ndarray[{{by_dtype}}] left_by_values, | |||
ndarray[{{by_dtype}}] right_by_values, | |||
bint allow_exact_matches=1, | |||
tolerance=None): | |||
tolerance=None, | |||
int64_t direction_enum=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason not to pass a string here instead?
diff = left_values[left_pos] - right_values[found_right_pos] | ||
if diff > tolerance_: | ||
right_indexer[left_pos] = -1 | ||
if direction_enum == 0: #backward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I think all of this can be nogil
right_indexer[left_pos] = -1 | ||
if direction_enum == 0: #backward | ||
right_pos = 0 | ||
for left_pos in range(left_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we pull each of these direction out as separate functions? (to make this top level a bit more clear)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll take a shot at that.
diff = left_values[left_pos] - right_values[right_pos] | ||
if diff > tolerance_: | ||
right_indexer[left_pos] = -1 | ||
if direction_enum == 0: # backward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this repeated here (the actual searching part)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This search is in the non-by logic. The search above tracks the last-seen position for each element in the by
column. (I assume you are asking why there are two asof_join_*
functions.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh right, now I remember, you directly did the groupby one and the non-groupby.
ok sure. that's more of a reason then to have some helpers to avoid repeating things (which in an of itself is not a perf issue), but a readability one.
|
||
Optionally match on equivalent keys with 'by' before searching for nearest | ||
match with 'on'. | ||
For each row in the left DataFrame, a "backward" search selects the last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is changed in 0.20.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I placed a "versionadded" string below the paragraph. Is that the right way to indicate a change in the middle of the prose?
@@ -1186,6 +1203,11 @@ def flip(xs): | |||
if tolerance is not None: | |||
tolerance = tolerance.value | |||
|
|||
# enumeration for direction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so remove this and just pass directly to the cython function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try your suggestion above of having three different functions for the search instead of one function with giant conditional branches. That means I can just use _asof_function()
to get the attribute with the direction string.
@@ -280,6 +280,8 @@ def merge_asof(left, right, on=None, | |||
search selects the row in the right DataFrame whose 'on' key is closest | |||
in absolute distance to the left's key. | |||
|
|||
.. versionadded:: "forward" and "nearest" added in 0.20.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably write it in the text that default 'direction' for < 0.20.0 is 'backward', while new options now exist in 0.20.0
match with 'on'. | ||
For each row in the left DataFrame, a "backward" search selects the last | ||
row in the right DataFrame whose 'on' key is less than or equal to the | ||
left's key. A "forward" search selects the first row in the right DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this a bulleted list (for forward, backward, nearest), easier to read
lgtm (minor doc-comment). already green (appeveyor seems stuck, working on that indepdendtly). so ping when pushed. also just verify that perf is the same. |
Current coverage is 86.26% (diff: 88.88%)@@ master #15129 diff @@
==========================================
Files 145 145
Lines 51274 54303 +3029
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43857 46844 +2987
- Misses 7417 7459 +42
Partials 0 0
|
@jreback I've pushed the bullet-pointed description. ASV came-out the same, which I would expect since the "backward" direction is the same code. |
thanks! |
closes pandas-dev#14887 Author: Christopher C. Aycock <[email protected]> Closes pandas-dev#15129 from chrisaycock/GH14887 and squashes the following commits: da38483 [Christopher C. Aycock] Description of direction parameters is now a bullet-point list 879c9f0 [Christopher C. Aycock] Tweak prose to include statement on version and parameters ce5caaa [Christopher C. Aycock] Split Cython functions according to direction 50431ad [Christopher C. Aycock] ENH: Added 'direction' parameter to merge_asof() (pandas-dev#14887)
git diff upstream/master | flake8 --diff