Skip to content

combine_first returns unexpected results for timestamp dataframes #28481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
leo4183 opened this issue Sep 17, 2019 · 2 comments · Fixed by #38145
Closed

combine_first returns unexpected results for timestamp dataframes #28481

leo4183 opened this issue Sep 17, 2019 · 2 comments · Fixed by #38145
Labels
Bug Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@leo4183
Copy link

leo4183 commented Sep 17, 2019

Problem Description

combine_first returns weird results in case there exist unmatched columns between two timestamp dataframes. this issue probably relates to #24357 (not retaining dtypes)

import pandas as pd
x = pd.DataFrame([pd.NaT,pd.NaT],columns=['a']).T
y = pd.DataFrame([[pd.NaT,pd.NaT],pd.to_datetime(['20190101','20190102'])],index=['a','b'],columns=[1,2])
x 0 1
a NaT NaT
y 1 2
a NaT NaT
b 2019-01-01 2019-01-02
Output of x.combine_first(y)
current 0 1 2
a NaT NaT -9.223372e+18
b NaT 2019-01-01 1.546387e+18
Output of x.reindex(columns=x.columns|y.columns).combine_first(y)
expected 0 1 2
a NaT NaT NaT
b NaT 2019-01-01 2019-01-02

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.18.0-80.7.2.el8_0.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.1
numpy : 1.17.2

@TomAugspurger
Copy link
Contributor

Thanks. In DataFrame.combine_first, we have needs_i8_conversion on the incoming values to get the integer values out of the ndarray. We may need to re-box after doing the expressions.where.

@TomAugspurger TomAugspurger added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Datetime Datetime data dtype labels Sep 19, 2019
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Sep 19, 2019
@tomyun
Copy link

tomyun commented Oct 7, 2019

I have the same issue. The code used to work in the past, producing NaT as expected, is now broken in the latest version of pandas.

And thanks @leo4183 to provide a workaround with reindex.

@mroeschke mroeschke added the Bug label Apr 2, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.2 Nov 8, 2020
@jreback jreback modified the milestones: 1.2, Contributions Welcome Nov 25, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.2 Nov 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
5 participants