Skip to content

datetime64[ns] dtype coerced to object after pd.concat(axis=1) #28786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
atlasstrategic opened this issue Oct 4, 2019 · 8 comments
Closed

datetime64[ns] dtype coerced to object after pd.concat(axis=1) #28786

atlasstrategic opened this issue Oct 4, 2019 · 8 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@atlasstrategic
Copy link

Code example using pandas 0.25.1

import pandas as pd

df1 = pd.DataFrame([], index=[], columns = ["foo"])
df2 = pd.DataFrame(data=list(range(20)),index=pd.date_range(start='2000', end='2020', freq='A-DEC'),columns=["bar"])
print(df2.index.dtype.name)
# Output: `datetime64[ns]`

df = pd.concat([df1, df2], axis=1)
print(df.index.dtype.name)
# Output: `object`

Problem description

The expected dtype of the concatenated DataFrame df should be datetime64[ns]. Previously, with Pandas 0.24.2, this was the case. After upgrading to pandas 0.25.1 it changed to object.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-64-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_ZA.UTF-8
LOCALE : en_ZA.UTF-8

pandas : 0.25.1
numpy : 1.17.2
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 40.4.3
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.2.1
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10
IPython : None
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : 0.14.1
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : 1.2.1

@mroeschke mroeschke added Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 6, 2019
@techytushar
Copy link
Contributor

techytushar commented Oct 9, 2019

@mroeschke I would like to work on this, please can you tell which files to look into ?

@mrocklin
Copy link
Contributor

mrocklin commented Oct 9, 2019

I think that you probably meant to ping someone else. I am not involved with this issue and don't know the Pandas codebase well.

@mroeschke
Copy link
Member

@techytushar probably pandas/core/reshape/concat.py

@codeape2
Copy link

I wonder what is the best workaround?

We have some existing code that stopped working because of this, the constructs are typically like:

# create empty dataframe
df = pd.DataFrame()
# add series to this dataframe using pd.concat
s0 = pd.Series([1, 2, 3], index=pd.date_range('2019-01-01', periods=3), name='s1')
s1 = pd.Series([1, 2, 3, 4, 5], index=pd.date_range('2019-01-01', periods=5), name='s2')
df = pd.concat([df, s0], axis=1)
df = pd.concat([df, s1], axis=2)

# len(df) is now 5 but alas, the index now has dtype object 

I have tried the following workarounds (in all code samples, df is an empty dataframe):

a) Assign columns directly:

df[s0.name] = s0
df[s1.name] = s1

# len(df) is 3, so this is not equivavalent (but the index has the correct dtype)

b) Use df.join(how='outer'):

df = df.join(s0, how='outer')
df = df.join(s1, how='outer')

# len(df) is 5, so df.join(how='outer') seems to be equivalent to pd.concat([df, series], axis=1)

c) Use df.assign()

df = df.assign(**{s0.name: s0})
df = df.assign(**{s1.name: s1})

# len(df) is 3, so this is not equivavalent, but the index has the correct dtype

Based on the code above, it seems to me that pd.concat([df, series], axis=1) is equivalent to df.join(series, how='outer').

Thoughts?

@codeape2
Copy link

Is this confirmed to be a bug? If I understand issues #23525 and PR #23538 correctly this is the expected behavior. @ArtinSarraf ?

@codeape2
Copy link

pd.concat() behaves as expected if the initial empty dataframe is created with an (empty) DatetimeIndex:

import pandas as pd

df = pd.DataFrame(index=pd.DatetimeIndex(data=[], freq=None))
s = pd.Series([1, 2, 3], index=pd.date_range('2019-01-01', periods=3))
df = pd.concat([df, s], axis=1)
# df.index is now a DatetimeIndex as expected

@jbrockmendel
Copy link
Member

This is the correct behavior. To keep the index dtype on concat, you need to cast the empty frame's index to dt64

df1.index = df1.index.astype(df2.index.dtype)

df = pd.concat([df1, df2], axis=1)
>>> df.index.dtype
dtype('<M8[ns]')

Closing.

@worthy7
Copy link

worthy7 commented Feb 24, 2024

So what is the correct way to performatly concat many many series' all with different dtypes and end up with a dataframe that respectively have a different column dtype?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

7 participants