Skip to content

Slicing Datetime MultiIndex with string or datetime.date slices #15906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kayibal opened this issue Apr 5, 2017 · 6 comments
Open

Slicing Datetime MultiIndex with string or datetime.date slices #15906

kayibal opened this issue Apr 5, 2017 · 6 comments
Labels
Bug datetime.date stdlib datetime.date support Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@kayibal
Copy link

kayibal commented Apr 5, 2017

Basically Indexing a MultiIndex with a DatetimeIndex seems only to be working if you use slices with datetime.datetime or pandas.Timestamp. One would expect it to work also with strings as well as with 'datetime.date' slices as it does for usual indizes.

This seems to be related to #3843.

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
import time
import datetime as dt

min_ts = time.mktime(dt.date(2016,10,1).timetuple())
timestamps = [min_ts + offset*86400 + noise*3600 
                  for offset, noise in enumerate(np.random.rand(100))]
time_idx = pd.to_datetime(sorted(timestamps), unit='s')
id_idx = np.random.choice(np.arange(10), 100)
df = pd.DataFrame(np.identity(100),
                 index=pd.MultiIndex.from_arrays(
                     [time_idx, id_idx]
                 ))

df.loc[dt.datetime(2016, 10, 1):]  # works
df.loc[dt.date(2016, 10, 1):]  # fails :/

Problem description

The above code raises the following exception on the last line, which is quite unexpected when one is used to indexing on single index dataframes.

Traceback (most recent call last):
  File "<input>", line 11, in <module>
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/core/indexing.py", line 1312, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/core/indexing.py", line 1453, in _getitem_axis
    return self._get_slice_axis(key, axis=axis)
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/core/indexing.py", line 1334, in _get_slice_axis
    slice_obj.step, kind=self.name)
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/base.py", line 2997, in slice_indexer
    kind=kind)
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/multi.py", line 1578, in slice_locs
    return super(MultiIndex, self).slice_locs(start, end, step, kind=kind)
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/base.py", line 3176, in slice_locs
    start_slice = self.get_slice_bound(start, 'left', kind)
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/multi.py", line 1549, in get_slice_bound
    return self._partial_tup_index(label, side=side)
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/multi.py", line 1594, in _partial_tup_index
    raise TypeError('Level type mismatch: %s' % lab)
TypeError: Level type mismatch: 2016-10-01

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 34.2.0 Cython: 0.25.2 numpy: 1.12.0 scipy: 0.18.1 statsmodels: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None boto: None pandas_datareader: None
@jreback
Copy link
Contributor

jreback commented Apr 5, 2017

datetime.date is not a first class type, so we barely support it, almost always it is coerced (and we certainly don't support indexing via it).

and strings work fine.

In [10]: df.loc['2016-10-01']
Out[10]: 
                               0    1    2    3    4    5    6    7    8    9    10   11   12   13   14   15   16   17   18   19   20   21   22   23   24 ...    75   76   77   78   79   80   81  \
2016-10-01 04:18:59.654717 4  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0   

                               82   83   84   85   86   87   88   89   90   91   92   93   94   95   96   97   98   99  
2016-10-01 04:18:59.654717 4  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  

[1 rows x 100 columns]

@jreback jreback closed this as completed Apr 5, 2017
@jreback
Copy link
Contributor

jreback commented Apr 5, 2017

all that said, if you want to do a PR to support indexing via datetime.date prob would take it.

@jreback jreback added this to the won't fix milestone Apr 5, 2017
@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Datetime Datetime data dtype labels Apr 5, 2017
@jorisvandenbossche
Copy link
Member

we certainly don't support indexing via it

We actually do support it:

In [50]: df = pd.DataFrame({'A': np.random.randn(10)}, index=pd.date_range("2012-01-01", periods=10))

In [51]: df
Out[51]: 
                   A
2012-01-01  0.457670
2012-01-02 -0.871395
2012-01-03  1.878444
2012-01-04  1.106600
2012-01-05 -0.282401
2012-01-06  0.057577
2012-01-07  0.567901
2012-01-08 -0.213132
2012-01-09  0.264840
2012-01-10 -0.994729

In [54]: df.loc['2012-01-03']
Out[54]: 
A    1.878444
Name: 2012-01-03 00:00:00, dtype: float64

In [55]: df.loc[datetime.date(2012, 1, 3)]
Out[55]: 
A    1.878444
Name: 2012-01-03 00:00:00, dtype: float64

Above is on normal DatetimeIndex (the key is just coerced to a Timestamp). MultiIndex has always lagged somewhat behind on that.
So I think having it consistent would be good.

Not that I recommend using a datetime.date!

@jreback jreback modified the milestones: Next Major Release, won't fix Apr 28, 2017
@jreback jreback reopened this Apr 28, 2017
@jreback
Copy link
Contributor

jreback commented Apr 28, 2017

reopening as doesn't work on MI as indicated above.

@pabloarosado
Copy link

I see this issue is still open. I have a similar problem on pandas 1.1.1:

df = pd.DataFrame({'a': ['2020-01-01', '2020-02-01', '2020-03-01'], 'b': [1, 2, 3], 'c': [10, 20, 30]})
df['a'] = pd.to_datetime(df['a'])
df = df.set_index(['a', 'b'])
df.loc[('2020-01-01', 1)]  # This works.
df.loc[[('2020-01-01', 1), ('2020-02-01', 2)]]  # This fails.
df.loc[[(pd.Timestamp(2020, 1, 1), 1), (pd.Timestamp(2020, 2, 1), 2)]]  # This works.

So, if you have a multiindex dataframe with a datetime index, you can access the datetime index as a string only if you extract one row. If you try to extract more than one, dates have to be given as timestamps.

I know this is not a very common scenario, but is there any plan to solve it, or is there any workaround? Thank you!

@rmongeca
Copy link

rmongeca commented Oct 14, 2020

I see this issue is still open. I have a similar problem on pandas 1.1.1:

df = pd.DataFrame({'a': ['2020-01-01', '2020-02-01', '2020-03-01'], 'b': [1, 2, 3], 'c': [10, 20, 30]})
df['a'] = pd.to_datetime(df['a'])
df = df.set_index(['a', 'b'])
df.loc[('2020-01-01', 1)]  # This works.
df.loc[[('2020-01-01', 1), ('2020-02-01', 2)]]  # This fails.
df.loc[[(pd.Timestamp(2020, 1, 1), 1), (pd.Timestamp(2020, 2, 1), 2)]]  # This works.

So, if you have a multiindex dataframe with a datetime index, you can access the datetime index as a string only if you extract one row. If you try to extract more than one, dates have to be given as timestamps.

I know this is not a very common scenario, but is there any plan to solve it, or is there any workaround? Thank you!

Just to build on what @pabloarosado has commented, by using datetime.datetime or datetime.date the multindex slicing also seems to work:

df.loc[[(datetime.datetime(2020, 1, 1), 1), (datetime.datetime(2020, 2, 1), 2)]]  # This works
df.loc[[(datetime.date(2020, 1, 1), 1), (datetime.date(2020, 2, 1), 2)]]  # This also works

@jbrockmendel jbrockmendel added the datetime.date stdlib datetime.date support label Jun 10, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug datetime.date stdlib datetime.date support Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

7 participants