Skip to content

QST:Different results of pd.merge if using on='key' vs left_on/right_on! #34273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sambafall opened this issue May 20, 2020 · 4 comments
Closed
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode Usage Question

Comments

@sambafall
Copy link

Hello,
I see a mysterious behaviour of merge function in pandas and i don't know if its a bug or because i missed something.
If i use the parameters left_on and right on an how='outer' then a key is missing and relpaced with a NA. If i use the same name for both key columns and use on='key' all the keys will be available. Is this a normal behaviour? Is there any difference between on and left_on/right_on?
Please see the examples below:

# Example 1: NA for key=2
a = pd.DataFrame({'aa': [1,3,4,5], 'bb':[1,2,3,4]})
b = pd.DataFrame({'ab': [1,2,3,4,5], 'cc':[1,2,3,4,5]})
pd.merge(a, b, left_on='aa', right_on='ab', how='outer')

# Example 2: no NA
a = pd.DataFrame({'aa': [1,3,4,5], 'bb':[1,2,3,4]})
b = pd.DataFrame({'aa': [1,2,3,4,5], 'cc':[1,2,3,4,5]})
pd.merge(a, b, on='aa', how='outer')
@sambafall sambafall added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels May 20, 2020
@phofl
Copy link
Member

phofl commented May 23, 2020

If you join with left_on and right_on both columns are in the output, if they have different names. So you will see NA, where a value is missing. If you join with on, the left and right column must have the same name, so the union is in the output, if you use an outer join.

Your examples are not equivalent. If you change your first example to

a = pd.DataFrame({'aa': [1,3,4,5], 'bb':[1,2,3,4]})
b = pd.DataFrame({'aa': [1,2,3,4,5], 'cc':[1,2,3,4,5]})
pd.merge(a, b, left_on='aa', right_on='aa', how='outer')

you will get the same result.

@sambafall
Copy link
Author

Hello, thanks for your answer. Why does the column name matter? The merge is done by matching the columns content right? Not the name, so calling it 'aa' or 'ab' should therefore give the same output?

@phofl
Copy link
Member

phofl commented May 23, 2020

Yes of course, but left_on and right_on keeps both columns, if they have different names. So you get your result. If both columns have the same name, the union of both columns is in the output.

Just try my example, the result is the same as your second examples result.

@dsaxton dsaxton removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jun 5, 2020
@mroeschke
Copy link
Member

Appears that this usage question has been sufficiently answered. Closing.

@mroeschke mroeschke added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Aug 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode Usage Question
Projects
None yet
Development

No branches or pull requests

4 participants