Skip to content

MultiIndex.from_product converts datetime.date to pd.Timestamp #28152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MaximeWeyl opened this issue Aug 26, 2019 · 5 comments
Open

MultiIndex.from_product converts datetime.date to pd.Timestamp #28152

MaximeWeyl opened this issue Aug 26, 2019 · 5 comments
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors datetime.date stdlib datetime.date support Datetime Datetime data dtype MultiIndex

Comments

@MaximeWeyl
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd

a = pd.Timestamp("2019-2-2").date()
i = pd.MultiIndex.from_product([
    [a, a],
    [2, 3]
])

print("a={} ({}".format(a, type(a)))
print(i[0])

Output is :

a=2019-02-02 (<class 'datetime.date'>
(Timestamp('2019-02-02 00:00:00'), 2)

Problem description

When using from_product with python datetimes, the resulting MultiIndex level is converted to pandas datetimes (Timestamps). There is no way to keep the original python datetime which I want. I got the same behavior with from_tuples. But it was not the case with from_arrays.

Expected Output

I expect the output to respect the type I gave to from_product :

a=2019-02-02 (<class 'datetime.date'>
(datetime.date(2019, 2, 2), 2)

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.1
numpy : 1.17.0
pytz : 2019.2
dateutil : 2.8.0
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : 4.6.5
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.7.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : 2.6.3
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None

@MaximeWeyl
Copy link
Author

One workaround I found is to pass Index to from_product instead of lists :

import pandas as pd

a = pd.Timestamp("2019-2-2").date()
i = pd.MultiIndex.from_product([
    pd.Index([a, a]),
    [2, 3]
])

print("a={} ({}".format(a, type(a)))
print(i[0])

Output :


a=2019-02-02 (<class 'datetime.date'>
(datetime.date(2019, 2, 2), 2)

@TomAugspurger
Copy link
Contributor

I expect the output to respect the type I gave to from_product :

You don't specify a dtype, right? So this is a bug in inference.

I would expect the MultiIndex constructors to follow the behavior of Index (and Series), which preserve the datetime.

In [24]: pd.Index([a, a])
Out[24]: Index([2019-02-02, 2019-02-02], dtype='object')

In [25]: pd.Index([a, a])[0]
Out[25]: datetime.date(2019, 2, 2)

@TomAugspurger TomAugspurger added Constructors Series/DataFrame/Index/pd.array Constructors MultiIndex Datetime Datetime data dtype labels Aug 26, 2019
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Aug 26, 2019
@TomAugspurger
Copy link
Contributor

Hmm the bug seems to be in Categorical (used by MI internally)

In [31]: pd.Categorical([a]).categories
Out[31]: DatetimeIndex(['2019-02-02'], dtype='datetime64[ns]', freq=None)

@mroeschke mroeschke changed the title MultiIndex.from_product converts datetime.datetime to pd.Timestamp MultiIndex.from_product converts datetime.date to pd.Timestamp Aug 26, 2019
@mroeschke
Copy link
Member

Just noting the distinction that this is an issue with datetime.date objects, which are not first class in pandas.

@mroeschke mroeschke added the Bug label Apr 2, 2020
@kurtosis
Copy link

kurtosis commented Jun 29, 2020

I got tripped up on this as well when using groupby on a datetime.date column and moving it in and out of the index. The different behavior between Index and MultiIndex is especially tricky.

One suggestion: maybe add documentation/warning on the preferred way to round timestamps to date? This is a common operation and dt.date was all I came across in the docs.
https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.date.html

(See also #15906)

In [4]: df = pd.DataFrame({'date' : pd.date_range(start='2020-01-01', periods=10), 'label' : 1})                                                                              
In [5]: df['date'] = df['date'].dt.date                                                                                                                                       
In [6]: display(df.date[0])                                                                                                                                                   
datetime.date(2020, 1, 1)
In [7]: df_1 = df.set_index('date').reset_index()                                                                                                                             
In [8]: display(df_1.date[0])                                                                                                                                                 
datetime.date(2020, 1, 1)
In [9]: df_1 = df.set_index(['date', 'label']).reset_index()                                                                                                                  
In [10]: display(df_1.date[0])                                                                                                                                                
Timestamp('2020-01-01 00:00:00')

@jbrockmendel jbrockmendel added the datetime.date stdlib datetime.date support label Jun 19, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors datetime.date stdlib datetime.date support Datetime Datetime data dtype MultiIndex
Projects
None yet
Development

No branches or pull requests

5 participants