Skip to content

DOC:/USAGE: Resulting Index when merging over indices #34412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
phofl opened this issue May 27, 2020 · 2 comments
Open

DOC:/USAGE: Resulting Index when merging over indices #34412

phofl opened this issue May 27, 2020 · 2 comments
Labels
Docs MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@phofl
Copy link
Member

phofl commented May 27, 2020

Location of the documentation

https://dev.pandas.io/docs/reference/api/pandas.DataFrame.merge.html

Documentation problem

There is no documentation on this case

Suggested fix for documentation

I'm a bit confused about the behavior here, because I don't know what to expect. When I perform the merge as shown below, I would expect that both operations return the same output.

left = pd.DataFrame({'a': [1, 2], 'b': [1, 1], "l": [22, 23]}).set_index(['a', 'b'])
right = pd.DataFrame({'b': [1], "r": [12]}).set_index(['b'])
print(pd.merge(left, right, left_on=['b'], right_index=True, how="left"))
print(pd.merge(left, right, left_on=['b'], right_on=["b"], how="left"))

But the Index of both results differs. The first merge has the same index as the left DataFrame while the second merge has only b as index. As a result, the index from the second merge is no longer unique.

First Merge:

      l   r
a b        
1 1  22  12
2 1  23  12

Second Merge:

    l   r
b        
1  22  12
1  23  12

If this is the desired behavior, we should adjust the documentation to show this. If it is not, I would file an BUG issue. We should adjust the documentation nevertheless. I would add an example with an index join to show the right behavior.

@phofl phofl added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels May 27, 2020
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 4, 2020

This is probably buggy, but I'm not entirely sure either. I think they should have the same index (I think the second one).

@TomAugspurger TomAugspurger added MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 4, 2020
@bjornasm
Copy link

Is there any progress here? Would be nice with a clarification if all merging should happen by having the dataframes have the same index or not. I don't see why it isn't straight forward to merge by choosing any columns in the dataframes, as you would expect its the content of the columns that have to match?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants