Skip to content

[documentation] merge(suffixes=(False, False)) should cause an error if a suffix would be required to complete the merge but it is undocumented #22045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jrounds opened this issue Jul 24, 2018 · 5 comments · Fixed by #22141
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Docs good first issue
Milestone

Comments

@jrounds
Copy link

jrounds commented Jul 24, 2018

Problem description

This was going to be a request for an enhancement, but after investigation it just became a reminder that pandas has an undocumented feature.

About once every few months I get into a tricky debug situation where a column has ceased to exist because it was suffixed with _x or _y, and while I understand use cases where suffixing is good, it is in some cases bad and unintended, and can be tricky because the _x and _y columns may not matter until much later. I wanted to be able to specify in merge that it should fail if it is going to suffix.

Turns out you can, but it is undocumented. Tracing code I eventually arrived at https://github.com/pandas-dev/pandas/blob/v0.23.3/pandas/core/internals.py#L5242

A quick check seems to indicate that merge(..., suffixes=(False,False)) causes the line to execute and doesn't bite if it shouldn't. Which is nice, but I would encourage you to document it at:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

import pandas as pd
left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                    'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})


right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
            'C': ['C0', 'C1', 'C2', 'C3'],
                  'B': ['D0', 'D1', 'D2', 'D3']})

right_2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
            'C': ['C0', 'C1', 'C2', 'C3'],
                  'D': ['D0', 'D1', 'D2', 'D3']})


result = pd.merge(left, right, on='key')
result

#should error and does
pd.merge(left, right, on='key', suffixes=(False, False))


#should not error and doesnt
pd.merge(left, right_2, on='key', suffixes=(False, False))
@gfyoung gfyoung added Docs Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jul 27, 2018
@gfyoung
Copy link
Member

gfyoung commented Jul 27, 2018

Encourage us to do it? Why aren't you encouraged to submit a PR and do it yourself? Nobody is (at least I am not) stopping you here on this one. 😉

@elmq0022
Copy link
Contributor

Claiming this one. I will take care of it just to get some experience with the documentation system.

@elmq0022
Copy link
Contributor

elmq0022 commented Aug 2, 2018

Any feedback on pull request #22141? Thanks!

@gfyoung
Copy link
Member

gfyoung commented Aug 2, 2018

I just pinged @jreback in your PR. Sorry about that!

@elmq0022
Copy link
Contributor

elmq0022 commented Aug 2, 2018

@gfyoung, no worries. I guess it has only been open a few days. I thought it was longer than that.

@jreback jreback added this to the 0.24.0 milestone Aug 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Docs good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants