BUG: merge raises for how='outer'/'right' when duplicate suffixes are specified #29697

jschendel · 2019-11-18T21:55:51Z

Code Sample, a copy-pastable example if possible

On master the following raises for how='outer' and how='right' with duplicate suffixes:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.26.0.dev0+958.g545d17529'

In [2]: df1 = pd.DataFrame({'A': list('ab'), 'B': [0, 1]})

In [3]: df2 = pd.DataFrame({'A':list('ac'), 'B': [100, 200]})

In [4]: pd.merge(df1, df2, on="A", how="outer", suffixes=("_x", "_x"))
---------------------------------------------------------------------------
ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

In [5]: pd.merge(df1, df2, on="A", how="right", suffixes=("_x", "_x"))
---------------------------------------------------------------------------
ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

Note that above works with how='inner' and how='left':

In [6]: pd.merge(df1, df2, on="A", how="inner", suffixes=("_x", "_x"))
Out[6]: 
   A  B_x  B_x
0  a    0  100

In [7]: pd.merge(df1, df2, on="A", how="left", suffixes=("_x", "_x"))
Out[7]: 
   A  B_x    B_x
0  a    0  100.0
1  b    1    NaN

Likewise, if unique suffixes are specified then how='outer' and how='right' work fine:

In [8]: pd.merge(df1, df2, on="A", how="outer", suffixes=("_x", "_y"))
Out[8]: 
   A  B_x    B_y
0  a  0.0  100.0
1  b  1.0    NaN
2  c  NaN  200.0

In [9]: pd.merge(df1, df2, on="A", how="right", suffixes=("_x", "_y"))
Out[9]: 
   A  B_x  B_y
0  a  0.0  100
1  c  NaN  200

Problem description

pandas.merge raises for how='outer' and how='right' with duplicate suffixes.

Expected Output

I'd expect In [4] and In [5] not to raise and produce output similar to Out[8] and Out[9] but with the duplicate suffix names.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 545d175
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 18.6.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.26.0.dev0+958.g545d17529
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.6.0.post20191030
Cython : 0.29.13
pytest : 4.6.2
hypothesis : 4.23.6
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.3
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : 0.3.0
gcsfs : None
lxml.etree : 4.3.3
matplotlib : 3.1.0
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : 0.11.1
pytables : None
s3fs : 0.2.1
scipy : 1.2.1
sqlalchemy : 1.3.4
tables : 3.5.2
xarray : 0.12.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8

The text was updated successfully, but these errors were encountered:

iamrajhans · 2019-11-19T06:55:15Z

@jschendel would like to work on this

AnnaDaglis · 2020-03-26T15:07:19Z

Seems to be fixed now? Cannot reproduce, all works fine!

jschendel · 2020-03-31T01:35:02Z

Yeah, looks to be working on master:

In [1]: import pandas as pd; pd.__version__
Out[1]: '1.1.0.dev0+1044.gb8385083b'

In [2]: df1 = pd.DataFrame({'A': list('ab'), 'B': [0, 1]})

In [3]: df2 = pd.DataFrame({'A':list('ac'), 'B': [100, 200]})

In [4]: pd.merge(df1, df2, on="A", how="outer", suffixes=("_x", "_x"))
Out[4]: 
   A  B_x    B_x
0  a  0.0  100.0
1  b  1.0    NaN
2  c  NaN  200.0

In [5]: pd.merge(df1, df2, on="A", how="right", suffixes=("_x", "_x"))
Out[5]: 
   A  B_x  B_x
0  a  0.0  100
1  c  NaN  200

I'm not sure which commit fixed this and I don't immediately see any relevant tests, so would welcome tests for this.

devjeetr · 2020-04-12T23:57:34Z

take

jschendel added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 18, 2019

jschendel added this to the Contributions Welcome milestone Nov 18, 2019

jschendel added Needs Tests Unit test(s) needed to prevent regressions good first issue and removed Bug labels Mar 31, 2020

github-actions bot assigned devjeetr Apr 12, 2020

devjeetr added a commit to devjeetr/pandas that referenced this issue Apr 13, 2020

Added two tests for issue pandas-dev#29697

a0c0867

This was referenced Apr 13, 2020

Added two tests for issue #29697 #33508

Merged

Issue devjeetr/pandas#2

Open

jreback modified the milestones: Contributions Welcome, 1.1 Apr 26, 2020

jreback closed this as completed in #33508 Apr 26, 2020

jreback pushed a commit that referenced this issue Apr 26, 2020

Added two tests for issue #29697 (#33508)

f3fdab3

rhshadrach pushed a commit to rhshadrach/pandas that referenced this issue May 10, 2020

Added two tests for issue pandas-dev#29697 (pandas-dev#33508)

89a7bd4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: merge raises for how='outer'/'right' when duplicate suffixes are specified #29697

BUG: merge raises for how='outer'/'right' when duplicate suffixes are specified #29697

jschendel commented Nov 18, 2019

INSTALLED VERSIONS

iamrajhans commented Nov 19, 2019

AnnaDaglis commented Mar 26, 2020

jschendel commented Mar 31, 2020

devjeetr commented Apr 12, 2020

BUG: merge raises for how='outer'/'right' when duplicate suffixes are specified #29697

BUG: merge raises for how='outer'/'right' when duplicate suffixes are specified #29697

Comments

jschendel commented Nov 18, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

iamrajhans commented Nov 19, 2019

AnnaDaglis commented Mar 26, 2020

jschendel commented Mar 31, 2020

devjeetr commented Apr 12, 2020

Output of `pd.show_versions()`