Skip to content

BUG: wrong join with nan in multiindex #34035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 3 tasks
gertjac opened this issue May 6, 2020 · 2 comments
Closed
1 of 3 tasks

BUG: wrong join with nan in multiindex #34035

gertjac opened this issue May 6, 2020 · 2 comments
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@gertjac
Copy link

gertjac commented May 6, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
df1 = pd.DataFrame([['1.','3.','4.'],
                    ['1.',pd.NA,'5.']],
                   columns=['i1', 'i2', 'v1']).set_index(['i1', 'i2'])
df2 = pd.DataFrame([['1.',pd.NA,'6.']],
                   columns=['i1', 'i2', 'v2']).set_index(['i1', 'i2'])
df1.join(df2, how='outer').loc[('1.', '3.')]['v2']

Problem description

the result is '6.'
so, the value of v2 in df2 appears as the value of v2 for index ('1.', '3.') after joining
this is unexpected for me, because the index is different from that in df2 ('1.',pd.NA)
this problem is also there with np.nan instead of pd.NA
and also in version 0.25.3 (when values are strings, but not when values are floats!)

Expected Output

NaN

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.3
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.12
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.4
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.6.1
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.1.0
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.0.1
pyxlsb : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.5
tables : 3.5.2
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8
numba : 0.44.1

@gertjac gertjac added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 6, 2020
@gertjac
Copy link
Author

gertjac commented May 22, 2020

Just discovered that this bug has been already reported: #29252

@TomAugspurger
Copy link
Contributor

Thanks. Closing as a duplicate of #29252.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants