Skip to content

BUG: merge raises for how='outer'/'right' when duplicate suffixes are specified #29697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jschendel opened this issue Nov 18, 2019 · 4 comments · Fixed by #33508
Closed

BUG: merge raises for how='outer'/'right' when duplicate suffixes are specified #29697

jschendel opened this issue Nov 18, 2019 · 4 comments · Fixed by #33508
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@jschendel
Copy link
Member

Code Sample, a copy-pastable example if possible

On master the following raises for how='outer' and how='right' with duplicate suffixes:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.26.0.dev0+958.g545d17529'

In [2]: df1 = pd.DataFrame({'A': list('ab'), 'B': [0, 1]})

In [3]: df2 = pd.DataFrame({'A':list('ac'), 'B': [100, 200]})

In [4]: pd.merge(df1, df2, on="A", how="outer", suffixes=("_x", "_x"))
---------------------------------------------------------------------------
ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

In [5]: pd.merge(df1, df2, on="A", how="right", suffixes=("_x", "_x"))
---------------------------------------------------------------------------
ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

Note that above works with how='inner' and how='left':

In [6]: pd.merge(df1, df2, on="A", how="inner", suffixes=("_x", "_x"))
Out[6]: 
   A  B_x  B_x
0  a    0  100

In [7]: pd.merge(df1, df2, on="A", how="left", suffixes=("_x", "_x"))
Out[7]: 
   A  B_x    B_x
0  a    0  100.0
1  b    1    NaN

Likewise, if unique suffixes are specified then how='outer' and how='right' work fine:

In [8]: pd.merge(df1, df2, on="A", how="outer", suffixes=("_x", "_y"))
Out[8]: 
   A  B_x    B_y
0  a  0.0  100.0
1  b  1.0    NaN
2  c  NaN  200.0

In [9]: pd.merge(df1, df2, on="A", how="right", suffixes=("_x", "_y"))
Out[9]: 
   A  B_x  B_y
0  a  0.0  100
1  c  NaN  200

Problem description

pandas.merge raises for how='outer' and how='right' with duplicate suffixes.

Expected Output

I'd expect In [4] and In [5] not to raise and produce output similar to Out[8] and Out[9] but with the duplicate suffix names.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 545d175
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 18.6.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.26.0.dev0+958.g545d17529
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.6.0.post20191030
Cython : 0.29.13
pytest : 4.6.2
hypothesis : 4.23.6
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.3
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : 0.3.0
gcsfs : None
lxml.etree : 4.3.3
matplotlib : 3.1.0
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : 0.11.1
pytables : None
s3fs : 0.2.1
scipy : 1.2.1
sqlalchemy : 1.3.4
tables : 3.5.2
xarray : 0.12.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8

@jschendel jschendel added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 18, 2019
@jschendel jschendel added this to the Contributions Welcome milestone Nov 18, 2019
@iamrajhans
Copy link
Contributor

@jschendel would like to work on this

@AnnaDaglis
Copy link
Contributor

Seems to be fixed now? Cannot reproduce, all works fine!

@jschendel
Copy link
Member Author

Yeah, looks to be working on master:

In [1]: import pandas as pd; pd.__version__
Out[1]: '1.1.0.dev0+1044.gb8385083b'

In [2]: df1 = pd.DataFrame({'A': list('ab'), 'B': [0, 1]})

In [3]: df2 = pd.DataFrame({'A':list('ac'), 'B': [100, 200]})

In [4]: pd.merge(df1, df2, on="A", how="outer", suffixes=("_x", "_x"))
Out[4]: 
   A  B_x    B_x
0  a  0.0  100.0
1  b  1.0    NaN
2  c  NaN  200.0

In [5]: pd.merge(df1, df2, on="A", how="right", suffixes=("_x", "_x"))
Out[5]: 
   A  B_x  B_x
0  a  0.0  100
1  c  NaN  200

I'm not sure which commit fixed this and I don't immediately see any relevant tests, so would welcome tests for this.

@jschendel jschendel added Needs Tests Unit test(s) needed to prevent regressions good first issue and removed Bug labels Mar 31, 2020
@devjeetr
Copy link
Contributor

take

devjeetr added a commit to devjeetr/pandas that referenced this issue Apr 13, 2020
This was referenced Apr 13, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.1 Apr 26, 2020
jreback pushed a commit that referenced this issue Apr 26, 2020
rhshadrach pushed a commit to rhshadrach/pandas that referenced this issue May 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants