datetime64[ns] dtype coerced to object after pd.concat(axis=1) #28786

atlasstrategic · 2019-10-04T09:11:40Z

Code example using pandas 0.25.1

import pandas as pd

df1 = pd.DataFrame([], index=[], columns = ["foo"])
df2 = pd.DataFrame(data=list(range(20)),index=pd.date_range(start='2000', end='2020', freq='A-DEC'),columns=["bar"])
print(df2.index.dtype.name)
# Output: `datetime64[ns]`

df = pd.concat([df1, df2], axis=1)
print(df.index.dtype.name)
# Output: `object`

Problem description

The expected dtype of the concatenated DataFrame df should be datetime64[ns]. Previously, with Pandas 0.24.2, this was the case. After upgrading to pandas 0.25.1 it changed to object.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.6.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-64-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_ZA.UTF-8
LOCALE : en_ZA.UTF-8

pandas : 0.25.1
numpy : 1.17.2
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 40.4.3
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.2.1
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10
IPython : None
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : 0.14.1
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : 1.2.1

The text was updated successfully, but these errors were encountered:

techytushar · 2019-10-09T16:43:39Z

@mroeschke I would like to work on this, please can you tell which files to look into ?

mrocklin · 2019-10-09T16:55:40Z

I think that you probably meant to ping someone else. I am not involved with this issue and don't know the Pandas codebase well.

mroeschke · 2019-10-09T17:21:46Z

@techytushar probably pandas/core/reshape/concat.py

codeape2 · 2019-10-30T16:44:21Z

I wonder what is the best workaround?

We have some existing code that stopped working because of this, the constructs are typically like:

# create empty dataframe
df = pd.DataFrame()
# add series to this dataframe using pd.concat
s0 = pd.Series([1, 2, 3], index=pd.date_range('2019-01-01', periods=3), name='s1')
s1 = pd.Series([1, 2, 3, 4, 5], index=pd.date_range('2019-01-01', periods=5), name='s2')
df = pd.concat([df, s0], axis=1)
df = pd.concat([df, s1], axis=2)

# len(df) is now 5 but alas, the index now has dtype object

I have tried the following workarounds (in all code samples, df is an empty dataframe):

a) Assign columns directly:

df[s0.name] = s0
df[s1.name] = s1

# len(df) is 3, so this is not equivavalent (but the index has the correct dtype)

b) Use df.join(how='outer'):

df = df.join(s0, how='outer')
df = df.join(s1, how='outer')

# len(df) is 5, so df.join(how='outer') seems to be equivalent to pd.concat([df, series], axis=1)

c) Use df.assign()

df = df.assign(**{s0.name: s0})
df = df.assign(**{s1.name: s1})

# len(df) is 3, so this is not equivavalent, but the index has the correct dtype

Based on the code above, it seems to me that pd.concat([df, series], axis=1) is equivalent to df.join(series, how='outer').

Thoughts?

codeape2 · 2019-10-31T19:02:28Z

Is this confirmed to be a bug? If I understand issues #23525 and PR #23538 correctly this is the expected behavior. @ArtinSarraf ?

codeape2 · 2019-10-31T19:14:04Z

pd.concat() behaves as expected if the initial empty dataframe is created with an (empty) DatetimeIndex:

import pandas as pd

df = pd.DataFrame(index=pd.DatetimeIndex(data=[], freq=None))
s = pd.Series([1, 2, 3], index=pd.date_range('2019-01-01', periods=3))
df = pd.concat([df, s], axis=1)
# df.index is now a DatetimeIndex as expected

jbrockmendel · 2023-05-19T16:34:08Z

This is the correct behavior. To keep the index dtype on concat, you need to cast the empty frame's index to dt64

df1.index = df1.index.astype(df2.index.dtype)

df = pd.concat([df1, df2], axis=1)
>>> df.index.dtype
dtype('<M8[ns]')

Closing.

worthy7 · 2024-02-24T19:20:48Z

So what is the correct way to performatly concat many many series' all with different dtypes and end up with a dataframe that respectively have a different column dtype?

mroeschke added Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 6, 2019

ConorSheehan1 mentioned this issue Oct 16, 2019

pd.concat inconsistent coercion of None depending of value in rows #29023

Closed

jbrockmendel closed this as completed May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datetime64[ns] dtype coerced to object after pd.concat(axis=1) #28786

datetime64[ns] dtype coerced to object after pd.concat(axis=1) #28786

atlasstrategic commented Oct 4, 2019

INSTALLED VERSIONS

techytushar commented Oct 9, 2019 •

edited

Loading

mrocklin commented Oct 9, 2019

mroeschke commented Oct 9, 2019

codeape2 commented Oct 30, 2019

codeape2 commented Oct 31, 2019

codeape2 commented Oct 31, 2019

jbrockmendel commented May 19, 2023

worthy7 commented Feb 24, 2024

datetime64[ns] dtype coerced to object after pd.concat(axis=1) #28786

datetime64[ns] dtype coerced to object after pd.concat(axis=1) #28786

Comments

atlasstrategic commented Oct 4, 2019

Code example using pandas 0.25.1

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

techytushar commented Oct 9, 2019 • edited Loading

mrocklin commented Oct 9, 2019

mroeschke commented Oct 9, 2019

codeape2 commented Oct 30, 2019

codeape2 commented Oct 31, 2019

codeape2 commented Oct 31, 2019

jbrockmendel commented May 19, 2023

worthy7 commented Feb 24, 2024

Output of `pd.show_versions()`

techytushar commented Oct 9, 2019 •

edited

Loading