Skip to content

BUG: merge_asof should be treated as a left join #34484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 12, 2021

Conversation

phofl
Copy link
Member

@phofl phofl commented May 30, 2020

The fix broke one test (test_merge_index_column_tz in test_merge_asof.py) which expected the right index instead of the left index.

The default index selection does not work in case of asof and left_index=True. I had to catch this case here.

@phofl phofl changed the title 33463 left index BUG: merge_asof should be treated as a left join May 31, 2020
@WillAyd
Copy link
Member

WillAyd commented Sep 3, 2020

@phofl is this still active? If so can you address @mroeschke comments / questions?

� Conflicts:
�	doc/source/whatsnew/v1.1.0.rst
�	pandas/tests/reshape/merge/test_merge_asof.py
@pep8speaks
Copy link

pep8speaks commented Sep 4, 2020

Hello @phofl! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-11 11:47:41 UTC

@phofl
Copy link
Member Author

phofl commented Sep 4, 2020

@WillAyd Yeah, for sure. Missed that completely, sorry.

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Sep 5, 2020
@phofl
Copy link
Member Author

phofl commented Sep 5, 2020

@jreback
Did you have something like this in mind?

I have a more general question about the _get_join_info part. The else part starting in line 884 is a bit weird in my opinion. The if condition if self.right_index is only reached in cases of inner, right and outer join. In case of a right join, we use the left index on purpose here. So the following code

import pandas as pd

index = pd.Index([1, 5, 10], name="test")
left = pd.DataFrame({"left": [1, 2, 3, 6, 7]}, index=[1, 2, 3, 6, 7])
right = pd.DataFrame({"right": ["a", "b", "c"], "right_time": [1, 4, 10]}, index=index)
result = pd.merge(left, right, right_index=True, left_on='left', how="right")

produces the left index.

     left right  right_time
1.0     1     a           1
NaN     5     b           4
NaN    10     c          10

although I would expect the right index here. Is this assumption correct? In this case I would modify the function calls of _create_join_index in a way to produce the desired results and keep the correct behavior in case of outer and inner joins. There are a few open issues about this topic.

@phofl
Copy link
Member Author

phofl commented Sep 16, 2020

@jreback Could you maybe have a look?

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 17, 2020
@itamarst
Copy link

@jreback seems like all comments were addressed, could you take a look? Thanks!

@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

can you merge master and move the note to 1.3 and will look.

@phofl
Copy link
Member Author

phofl commented Dec 29, 2020

Done

@jreback jreback removed the Stale label Dec 29, 2020
@jreback
Copy link
Contributor

jreback commented Feb 11, 2021

@phofl oldie, but if you can merge will take a look

� Conflicts:
�	doc/source/whatsnew/v1.3.0.rst
�	pandas/tests/reshape/merge/test_merge_asof.py
@phofl
Copy link
Member Author

phofl commented Feb 13, 2021

@jreback merged, failure unrelated

@jreback jreback added this to the 1.3 milestone Feb 15, 2021
@jreback
Copy link
Contributor

jreback commented Feb 15, 2021

ok let me look closer at this.

phofl added 2 commits April 11, 2021 03:08
� Conflicts:
�	pandas/core/reshape/merge.py
�	pandas/tests/reshape/merge/test_merge_asof.py
how="right",
)
else:
join_index = self.right.index.take(right_indexer)
left_indexer = np.array([-1] * len(join_index))
elif self.left_index:
if len(self.right) > 0:
if self.how == "asof":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you write a comment here explaining why we're doing how="left" here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mroeschke
Copy link
Member

Generally this looks okay to me. Mind merging in master?

@phofl
Copy link
Member Author

phofl commented Apr 11, 2021

merged master

@jreback jreback merged commit 5b48936 into pandas-dev:master Apr 12, 2021
@jreback
Copy link
Contributor

jreback commented Apr 12, 2021

thanks @phofl very nice!

@phofl phofl deleted the 33463_left_index branch April 12, 2021 17:33
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

merge_asof(left_index=True, right_on=...) overwrites left index with right index
6 participants