BUG: Merge with EA and MultiIndex #43734

benoit9126 · 2021-09-24T15:59:09Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd

df1 = pd.DataFrame(data={("lvl0","lvl1-a"): ["1","2","3"], ("lvl0","lvl1-b"): ["4","5","6"]}, dtype=str)

df2 = pd.DataFrame(data={("lvl0","lvl1-a"): ["1","2","3"], ("lvl0","lvl1-c"): ["7","8","9"]}, dtype=pd.StringDtype())

result = pd.merge(left=df1, right=df2, on=[("lvl0","lvl1-a")])

Issue Description

I am trying to make a merge between two data frames with a MultiIndex for the columns. The first data frame has an object data type (str in my example) and the second an extension array data type (pd.StringDtype() in my example). In this case, in an internal method, an assign method is used but unfortunately, the name of the column is not a string but a tuple...

Expected Behavior

I would have expected the merge to happen as it is with two str dtype data frames or two pd.StirngDtype() dtype data frames.

    lvl0              
  lvl1-a lvl1-b lvl1-c
0      1      4      7
1      2      5      8
2      3      6      9

Installed Versions

INSTALLED VERSIONS

commit : 73c6825
python : 3.9.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.0-8-amd64
Version : #1 SMP Debian 5.10.46-4 (2021-08-03)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.UTF-8
pandas : 1.3.3
numpy : 1.21.2
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 57.4.0
Cython : 0.29.24
pytest : 6.2.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : 1.4.25
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

debnathshoham · 2021-09-24T16:50:23Z

Thanks @benoit9126 for the report!
What would be the expected dtype of ('lvl0', 'lvl1-a')?

benoit9126 · 2021-09-24T17:00:16Z

If, by instance, you take data frames with integer (the first with int and the other with pd.Int64Dtype()), the previous example works without troubles and the resulting data frame has columns of type ~~pd.Int64Dtype~~ numpy.dtype[int64]
(all columns)

import pandas as pd
df1 = pd.DataFrame(data={("lvl0","lvl1-a"): [1,2,3], ("lvl0","lvl1-b"): [4,5,6]}, dtype=int)
df2 = pd.DataFrame(data={("lvl0","lvl1-a"): [1,2,3], ("lvl0","lvl1-c"): [7,8,9]}, 
                   dtype=pd.Int64Dtype())
result = pd.merge(left=df1, right=df2, on=[("lvl0","lvl1-a")])
result.dtypes
# lvl0  lvl1-a    int64
#       lvl1-b    int64
#       lvl1-c    Int64
# dtype: object

~~I think this is the safest option as there can be a None, pd.NA, np.nan in the resulting columns that are not in the input data frame with object dtype.~~ Here, an object data type for the resulting key column may be sufficient

benoit9126 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 24, 2021

debnathshoham added MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Sep 24, 2021

benoit9126 mentioned this issue Sep 28, 2021

BUG: Merge with str/StringDtype keys and multiindex #43785

Merged

4 tasks

mroeschke added ExtensionArray Extending pandas with custom dtypes or arrays. and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 1, 2021

jreback added this to the 1.4 milestone Oct 11, 2021

jreback closed this as completed in #43785 Oct 16, 2021

jreback pushed a commit that referenced this issue Oct 16, 2021

BUG: Merge with str/StringDtype keys and multiindex (#43734) (#43785)

6c35a62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Merge with EA and MultiIndex #43734

BUG: Merge with EA and MultiIndex #43734

benoit9126 commented Sep 24, 2021

INSTALLED VERSIONS

debnathshoham commented Sep 24, 2021

benoit9126 commented Sep 24, 2021 •

edited

Loading

BUG: Merge with EA and MultiIndex #43734

BUG: Merge with EA and MultiIndex #43734

Comments

benoit9126 commented Sep 24, 2021

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

debnathshoham commented Sep 24, 2021

benoit9126 commented Sep 24, 2021 • edited Loading

benoit9126 commented Sep 24, 2021 •

edited

Loading