Skip to content

ENH: Added 'direction' parameter to merge_asof() (#14887) #15129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

ENH: Added 'direction' parameter to merge_asof() (#14887) #15129

wants to merge 4 commits into from

Conversation

chrisaycock
Copy link
Contributor

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels Jan 13, 2017
@@ -33,13 +33,15 @@ def asof_join_{{on_dtype}}_by_{{by_dtype}}(ndarray[{{on_dtype}}] left_values,
ndarray[{{by_dtype}}] left_by_values,
ndarray[{{by_dtype}}] right_by_values,
bint allow_exact_matches=1,
tolerance=None):
tolerance=None,
int64_t direction_enum=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to pass a string here instead?

diff = left_values[left_pos] - right_values[found_right_pos]
if diff > tolerance_:
right_indexer[left_pos] = -1
if direction_enum == 0: #backward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I think all of this can be nogil

right_indexer[left_pos] = -1
if direction_enum == 0: #backward
right_pos = 0
for left_pos in range(left_size):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pull each of these direction out as separate functions? (to make this top level a bit more clear)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll take a shot at that.

diff = left_values[left_pos] - right_values[right_pos]
if diff > tolerance_:
right_indexer[left_pos] = -1
if direction_enum == 0: # backward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this repeated here (the actual searching part)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This search is in the non-by logic. The search above tracks the last-seen position for each element in the by column. (I assume you are asking why there are two asof_join_* functions.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh right, now I remember, you directly did the groupby one and the non-groupby.

ok sure. that's more of a reason then to have some helpers to avoid repeating things (which in an of itself is not a perf issue), but a readability one.


Optionally match on equivalent keys with 'by' before searching for nearest
match with 'on'.
For each row in the left DataFrame, a "backward" search selects the last
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is changed in 0.20.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I placed a "versionadded" string below the paragraph. Is that the right way to indicate a change in the middle of the prose?

@@ -1186,6 +1203,11 @@ def flip(xs):
if tolerance is not None:
tolerance = tolerance.value

# enumeration for direction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so remove this and just pass directly to the cython function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try your suggestion above of having three different functions for the search instead of one function with giant conditional branches. That means I can just use _asof_function() to get the attribute with the direction string.

@@ -280,6 +280,8 @@ def merge_asof(left, right, on=None,
search selects the row in the right DataFrame whose 'on' key is closest
in absolute distance to the left's key.

.. versionadded:: "forward" and "nearest" added in 0.20.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably write it in the text that default 'direction' for < 0.20.0 is 'backward', while new options now exist in 0.20.0

@chrisaycock chrisaycock reopened this Jan 17, 2017
match with 'on'.
For each row in the left DataFrame, a "backward" search selects the last
row in the right DataFrame whose 'on' key is less than or equal to the
left's key. A "forward" search selects the first row in the right DataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this a bulleted list (for forward, backward, nearest), easier to read

@jreback jreback added this to the 0.20.0 milestone Jan 17, 2017
@jreback
Copy link
Contributor

jreback commented Jan 17, 2017

lgtm (minor doc-comment). already green (appeveyor seems stuck, working on that indepdendtly). so ping when pushed.

also just verify that perf is the same.

@codecov-io
Copy link

codecov-io commented Jan 17, 2017

Current coverage is 86.26% (diff: 88.88%)

Merging #15129 into master will increase coverage by 0.72%

@@             master     #15129   diff @@
==========================================
  Files           145        145           
  Lines         51274      54303   +3029   
  Methods           0          0           
  Messages          0          0           
  Branches          0          0           
==========================================
+ Hits          43857      46844   +2987   
- Misses         7417       7459     +42   
  Partials          0          0           

Powered by Codecov. Last update 5a0eb5a...879c9f0

@chrisaycock
Copy link
Contributor Author

@jreback I've pushed the bullet-pointed description. ASV came-out the same, which I would expect since the "backward" direction is the same code.

@jreback
Copy link
Contributor

jreback commented Jan 18, 2017

thanks!

@jreback jreback closed this in cc36009 Jan 18, 2017
@chrisaycock chrisaycock deleted the GH14887 branch January 18, 2017 17:01
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
closes pandas-dev#14887

Author: Christopher C. Aycock <[email protected]>

Closes pandas-dev#15129 from chrisaycock/GH14887 and squashes the following commits:

da38483 [Christopher C. Aycock] Description of direction parameters is now a bullet-point list
879c9f0 [Christopher C. Aycock] Tweak prose to include statement on version and parameters
ce5caaa [Christopher C. Aycock] Split Cython functions according to direction
50431ad [Christopher C. Aycock] ENH: Added 'direction' parameter to merge_asof() (pandas-dev#14887)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add "forward" and "nearest" direction to merge_asof()
3 participants