Skip to content

dtypes convert to object on merge on 1.0.0rc0 #31208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mdoom23 opened this issue Jan 22, 2020 · 1 comment · Fixed by #31211
Closed

dtypes convert to object on merge on 1.0.0rc0 #31208

mdoom23 opened this issue Jan 22, 2020 · 1 comment · Fixed by #31211
Labels
Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@mdoom23
Copy link

mdoom23 commented Jan 22, 2020

dtypes convert to object on merge

Currently on 1.0.0rc0, when doing a left merge with datetime64[ns] on the right dataframe, if any rows from the left dataframe don't have a match on the right dataframe, then the result dataframe converts datetime to be object. If all items match, then it will remain as a datetime column. This previously maintained dtype in 0.25.3 and 0.24.2.

It seems to no longer maintain the dtype and populate values with NaT.

With 1.0.0rc0, after this I am able to convert to datetime column and it'll properly recognize as a NaT value.

Example with extra value in left dataframe

df1 = pd.DataFrame({'x': {0: 'a', 1: 'b', 2:'c'}, 'y': {0: '1', 1: '2', 2:'4'}})

df2 = pd.DataFrame({'y': {0: '1', 1: '2', 2:'3'}, 'z': {0: '2018-05-01', 1: '2018-05-02', 2:'2018-05-03'}})
df2['z'] = df2['z'].astype('datetime64[ns]')

result = pd.merge(df1, df2, how='left', on='y')

Output

  # 0.24.2
result.dtypes
x            object
y            object
z    datetime64[ns]
dtype: object

  # 0.25.3
result.dtypes
x            object
y            object
z    datetime64[ns]
dtype: object

  # 1.0.0rc0
result.dtypes
x            object
y            object
z            object
dtype: object
@mdoom23 mdoom23 changed the title dtypes convert to object on merge dtypes convert to object on merge on 1.0.0rc0 Jan 22, 2020
@TomAugspurger
Copy link
Contributor

Thanks for the excellent report. This is a hacky fix

diff --git a/pandas/core/internals/concat.py b/pandas/core/internals/concat.py
index c6f30ef65e..bc307d6b74 100644
--- a/pandas/core/internals/concat.py
+++ b/pandas/core/internals/concat.py
@@ -242,6 +242,8 @@ def concatenate_join_units(join_units, concat_axis, copy):
         raise AssertionError("Concatenating join units along axis0")
 
     empty_dtype, upcasted_na = _get_empty_dtype_and_na(join_units)
+    if is_datetime64_dtype(empty_dtype) and upcasted_na == tslibs.iNaT:
+        upcasted_na = tslibs.NaT
 
     to_concat = [
         ju.get_reindexed_values(empty_dtype=empty_dtype, upcasted_na=upcasted_na)
In [3]: result
Out[3]:
   x  y          z
0  a  1 2018-05-01
1  b  2 2018-05-02
2  c  4        NaT

In [4]: result.dtypes
Out[4]:
x            object
y            object
z    datetime64[ns]
dtype: object

I don't recall why we're using iNaT for upcasted NA there. The git blame isn't too informative. I'll see what all fails.

@TomAugspurger TomAugspurger added this to the 1.0.0 milestone Jan 22, 2020
@TomAugspurger TomAugspurger added Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels Jan 22, 2020
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants