BUG: Merge fails on nullable integer columns when dtypes are unequal #46799
Labels
Bug
Duplicate Report
Duplicate issue or pull request
Reshaping
Concat, Merge/Join, Stack/Unstack, Explode
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
If you try to merge two (or more, probably) objects, and all of the following are true
pd.Int64Dtype()
)pd.Int64Dtype()
or non-nullable egnp.int64
)col1.dtype != col2.dtype
then the merge fails with:
However, if I change the
np.int64
dtype topd.Int64Dtype()
as well (eg both types are equal), then the merge works fine. (Note that changing both types to non-nullable types isn't applicable, because some of the keys are indeed null so that would be impossible). Also fails if one isInt64
and the other isUInt64
. Tested on main.The offending code is probably in
pandas/core/reshape/merge.py
, line 2227, in_factorize_keys
, but I didn't dig any deeper than that.Expected Behavior
The merge should act the same as when both keys are nullable and dtypes are equal (e.g. both
Int64Dtype
):Installed Versions
INSTALLED VERSIONS
commit : ec75436
python : 3.8.12.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.47-linuxkit
Version : #1 SMP Sat Jul 3 21:51:47 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : None.None
pandas : 1.5.0.dev0+688.gec7543622a
numpy : 1.22.2
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 60.9.3
Cython : 0.29.28
pytest : 7.0.1
hypothesis : 6.39.1
sphinx : 4.3.2
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.8.0
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.1.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.4
brotli :
fastparquet : 0.8.0
fsspec : 2021.11.0
gcsfs : 2021.11.0
markupsafe : 2.1.0
matplotlib : 3.5.1
numba : 0.53.1
numexpr : 2.8.0
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 7.0.0
pyreadstat : 1.1.4
pyxlsb : None
s3fs : 2021.11.0
scipy : 1.8.0
snappy :
sqlalchemy : 1.4.31
tables : 3.7.0
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
zstandard : None
The text was updated successfully, but these errors were encountered: