Slicing on DatetimeIndex throws KeyError: [int] not found #13090

twiecki · 2016-05-05T11:17:10Z

Code Sample, a copy-pastable example if possible

Unfortunately can't reproduce it with other examples. Only this which I load from csv.

txn.loc[pd.Timestamp('2014-06-04 00:00:00'):]

KeyError                                  Traceback (most recent call last)
<ipython-input-35-ce3d2db63a82> in <module>()
----> 1 txn.loc[pd.Timestamp('2014-06-04 00:00:00'):]

/opt/miniconda/lib/python2.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1284             return self._getitem_tuple(key)
   1285         else:
-> 1286             return self._getitem_axis(key, axis=0)
   1287 
   1288     def _getitem_axis(self, key, axis=0):

/opt/miniconda/lib/python2.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1398         if isinstance(key, slice):
   1399             self._has_valid_type(key, axis)
-> 1400             return self._get_slice_axis(key, axis=axis)
   1401         elif is_bool_indexer(key):
   1402             return self._getbool_axis(key, axis=axis)

/opt/miniconda/lib/python2.7/site-packages/pandas/core/indexing.py in _get_slice_axis(self, slice_obj, axis)
   1306         labels = obj._get_axis(axis)
   1307         indexer = labels.slice_indexer(slice_obj.start, slice_obj.stop,
-> 1308                                        slice_obj.step, kind=self.name)
   1309 
   1310         if isinstance(indexer, slice):

/opt/miniconda/lib/python2.7/site-packages/pandas/tseries/index.py in slice_indexer(self, start, end, step, kind)
   1503 
   1504         try:
-> 1505             return Index.slice_indexer(self, start, end, step, kind=kind)
   1506         except KeyError:
   1507             # For historical reasons DatetimeIndex by default supports

/opt/miniconda/lib/python2.7/site-packages/pandas/indexes/base.py in slice_indexer(self, start, end, step, kind)
   2698         """
   2699         start_slice, end_slice = self.slice_locs(start, end, step=step,
-> 2700                                                  kind=kind)
   2701 
   2702         # return a slice

/opt/miniconda/lib/python2.7/site-packages/pandas/indexes/base.py in slice_locs(self, start, end, step, kind)
   2877         start_slice = None
   2878         if start is not None:
-> 2879             start_slice = self.get_slice_bound(start, 'left', kind)
   2880         if start_slice is None:
   2881             start_slice = 0

/opt/miniconda/lib/python2.7/site-packages/pandas/indexes/base.py in get_slice_bound(self, label, side, kind)
   2826             except ValueError:
   2827                 # raise the original KeyError
-> 2828                 raise err
   2829 
   2830         if isinstance(slc, np.ndarray):

KeyError: 1401840000000000000

Expected Output

Slice of everything after '2014-06-04 00:00:00'.

output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-77-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.3
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.8
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.30.0

The text was updated successfully, but these errors were encountered:

jreback · 2016-05-05T12:08:35Z

So this requires montonicity (sortedness). documentation was added here to describe this. Though I think we could have a better error message in this case (still a KeyError, but saying its not monotonic). I don't know why you would get that odd key error. Do you have a repro?

In [7]: dti = pd.date_range('20130101 09:00:00', periods=50, freq='2H')

In [8]: df = DataFrame({'values' : range(len(dti))},index=dti)

In [9]: df
Out[9]: 
                     values
2013-01-01 09:00:00       0
2013-01-01 11:00:00       1
2013-01-01 13:00:00       2
2013-01-01 15:00:00       3
2013-01-01 17:00:00       4
...                     ...
2013-01-05 03:00:00      45
2013-01-05 05:00:00      46
2013-01-05 07:00:00      47
2013-01-05 09:00:00      48
2013-01-05 11:00:00      49

[50 rows x 1 columns]

In [10]: df.loc[pd.Timestamp('20130105 04:00:00'):]
Out[10]: 
                     values
2013-01-05 05:00:00      46
2013-01-05 07:00:00      47
2013-01-05 09:00:00      48
2013-01-05 11:00:00      49

In [11]: df.sample(25)
Out[11]: 
                     values
2013-01-02 07:00:00      11
2013-01-01 13:00:00       2
2013-01-02 05:00:00      10
2013-01-04 03:00:00      33
2013-01-03 07:00:00      23
...                     ...
2013-01-01 15:00:00       3
2013-01-01 23:00:00       7
2013-01-05 11:00:00      49
2013-01-02 13:00:00      14
2013-01-03 03:00:00      21

[25 rows x 1 columns]
In [12]: df.sample(25).loc[pd.Timestamp('20130105 04:00:00'):]
KeyError: Timestamp('2013-01-05 04:00:00')

jreback · 2016-05-05T12:09:44Z

cc @nileracecrew

@shoyer @jorisvandenbossche

twiecki · 2016-05-05T12:09:57Z

That works, thank you! Feel free to close.

shoyer · 2016-05-05T14:50:39Z

We should really be returning an UnsortedIndexError or something like that here instead of the KeyError. Didn't we discuss adding something like that for MultiIndex?

Also, it's unfortunate that we're returning in integer as the error message instead of the original time stamp.

jreback · 2016-05-05T14:56:06Z

#12790

and new exception in #11897

sarwatfatimam · 2017-03-16T07:01:49Z

Hi. I am trying to remove duplicated rows based on time from date column.
df_u.drop_duplicates(df_u['Response Time'], keep='last')
However, I am getting this error:

Traceback (most recent call last):
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\indexes\base.py", line 2134, in get_loc
return self._engine.get_loc(key)
File "pandas\index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
File "pandas\src\hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)
File "pandas\src\hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)
KeyError: Timestamp('2016-05-25 19:09:37')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/Sarwat/Documents/DealSmash/Scripts/DataAnalysis/GCM.py", line 37, in
df_u.drop_duplicates(df_u['Response Time'], keep='last')
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\util\decorators.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3138, in drop_duplicates
duplicated = self.duplicated(subset, keep=keep)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\util\decorators.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3188, in duplicated
labels, shape = map(list, zip(*map(f, vals)))
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3187, in
vals = (self[col].values for col in subset)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2059, in getitem
return self._getitem_column(key)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2066, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1386, in _get_item_cache
values = self._data.get(item)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\core\internals.py", line 3543, in get
loc = self.items.get_loc(item)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\indexes\base.py", line 2136, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
File "pandas\src\hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)
File "pandas\src\hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)
KeyError: Timestamp('2016-05-25 19:09:37')
"KeyError: Timestamp('2016-05-25 10:30:00')".

jorisvandenbossche · 2017-03-17T14:20:53Z

@sarwatfatimam Please open a separate issue for this (or first ask on gitter or on the mailing list) if you think there is a bug. But, I think you are using drop_duplicates wrong, as you should pass a list of column names instead of the actual columns: df_u.drop_duplicates(['Response Time'], keep='last') (so not df_u['Response Time'])

fkromer · 2021-02-17T15:56:23Z

I've the same issue with pandas 0.25.0 when trying to slice a DataFrame df with monotonically increasing DateTimes as indices which looks like this

                                  1         2
timestamp                                           
2021-02-17 16:07:53.359581+01:00  5.232185  5.214104
2021-02-17 16:07:53.862581+01:00  5.189049  5.202629
2021-02-17 16:07:54.364581+01:00  5.123482  5.131927
2021-02-17 16:07:54.865581+01:00  5.086906  5.085625
2021-02-17 16:07:55.368581+01:00  5.139999  5.080673
...                                    ...       ...
2021-02-17 16:08:50.611581+01:00  4.879005  4.861519
2021-02-17 16:08:51.115581+01:00  4.807243  4.827316
2021-02-17 16:08:52.119581+01:00  4.809059  4.746559
2021-02-17 16:08:52.621581+01:00  4.822960  4.646828
2021-02-17 16:08:52.998581+01:00  0.000000  4.741882

based on DateTime using windowed_dataframe = df[oldest_ts_to_consider:youngest_ts] I get KeyError: Timestamp('2021-02-17 16:07:52.998581+0100', tz='tzoffset(None, 3600)') with this traceback:

  File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 2976, in __getitem__
    indexer = convert_to_index_sliceable(self, key)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexing.py", line 2358, in convert_to_index_sliceable
    return idx._convert_slice_indexer(key, kind="getitem")
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3216, in _convert_slice_indexer
    indexer = self.slice_indexer(start, stop, step, kind=kind)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 1151, in slice_indexer
    return Index.slice_indexer(self, start, end, step, kind=kind)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 5034, in slice_indexer
    start_slice, end_slice = self.slice_locs(start, end, step=step, kind=kind)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 5248, in slice_locs
    start_slice = self.get_slice_bound(start, "left", kind)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 5170, in get_slice_bound
    raise err
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 5164, in get_slice_bound
    slc = self.get_loc(label)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 1039, in get_loc
    return Index.get_loc(self, key, method, tolerance)
    File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 410, in pandas._libs.index.DatetimeEngine.get_loc
  File "pandas/_libs/index.pyx", line 440, in pandas._libs.index.DatetimeEngine.get_loc
KeyError: Timestamp('2021-02-17 16:07:52.998581+0100', tz='tzoffset(None, 3600)')

BTW: One workaround is to convert each col to a series, window each series and combine the series back to a dataframe. With series I had never problems with slicing so far.

enricorotundo · 2021-04-14T10:26:42Z

Ran into this error with DatetimeIndex and .loc[window_start:window_end]. Solved by .sort_index(). Without @jreback answer, it would have been quite obscure to workaround.

jreback added Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Usage Question labels May 5, 2016

jreback added Difficulty Novice Error Reporting Incorrect or improved errors from pandas and removed Usage Question labels May 5, 2016

jreback added this to the 0.18.2 milestone May 5, 2016

shoyer mentioned this issue May 5, 2016

BUG: loc raises inconsistent error on unsorted MultiIndex #12790

Closed

4 tasks

jorisvandenbossche modified the milestones: 0.20.0, 0.19.0 Aug 21, 2016

jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017

TomAugspurger added the good first issue label Oct 11, 2017

jreback removed the Difficulty Novice label Dec 15, 2017

jbrockmendel removed the Effort Low label Oct 21, 2019

simonjayhawkins mentioned this issue Jul 2, 2020

BUG: Slicing on non-monotonic DatetimeIndex inconsistencies #34820

Closed

mroeschke added Enhancement and removed good first issue labels Apr 24, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slicing on DatetimeIndex throws KeyError: [int] not found #13090

Slicing on DatetimeIndex throws KeyError: [int] not found #13090

twiecki commented May 5, 2016

jreback commented May 5, 2016

jreback commented May 5, 2016

twiecki commented May 5, 2016

shoyer commented May 5, 2016

jreback commented May 5, 2016

sarwatfatimam commented Mar 16, 2017 •

edited

Loading

jorisvandenbossche commented Mar 17, 2017

fkromer commented Feb 17, 2021

enricorotundo commented Apr 14, 2021

Slicing on DatetimeIndex throws KeyError: [int] not found #13090

Slicing on DatetimeIndex throws KeyError: [int] not found #13090

Comments

twiecki commented May 5, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

jreback commented May 5, 2016

jreback commented May 5, 2016

twiecki commented May 5, 2016

shoyer commented May 5, 2016

jreback commented May 5, 2016

sarwatfatimam commented Mar 16, 2017 • edited Loading

jorisvandenbossche commented Mar 17, 2017

fkromer commented Feb 17, 2021

enricorotundo commented Apr 14, 2021

output of `pd.show_versions()`

sarwatfatimam commented Mar 16, 2017 •

edited

Loading