-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Confusing interpretation of what DataFrame.join does #12188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
xref to #6336 |
So the above join operation does this
this merge
Whereas I think you think it should do this one
I agree that is a bit confusing a bit, but I think the rationale is that you are joining from left, and the So this is what
So I think a doc-update, maybe with some more examples would help? |
Thanks for the clarification! I agree that a documentation update will be really helpful. Besides an update in the join method doc, a brief mention of this behavior would be useful at the bottom of Database-style DataFrame joining/merging section here. |
sure - pull requests would be welcome! |
From the docs I see that the difference between .merge and .join is that .join operates using indexes by default, but it also lets you use columns, so I tried to use it for that since it sounds natural when coming from the SQL world.
From the docs:
From my understanding, if on is absent a join operation is performed on the index, if on is present, it would be reasonable to think that the same operation would be performed.
Having said that:
Output:
And if I add id as index:
Output:
Is that the correct behavior? If yes, I think the documentation is misleading. It took me a lot to find the bug in my code and I ended up using merge since .join works in an unexpected way.
I don't think I'm the only one with this issue, so maybe a change in the documentation would help to clarify how .join works.
The text was updated successfully, but these errors were encountered: