Skip to content

Slicing on DatetimeIndex throws KeyError: [int] not found #13090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
twiecki opened this issue May 5, 2016 · 9 comments
Open

Slicing on DatetimeIndex throws KeyError: [int] not found #13090

twiecki opened this issue May 5, 2016 · 9 comments
Labels
Datetime Datetime data dtype Enhancement Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@twiecki
Copy link
Contributor

twiecki commented May 5, 2016

Code Sample, a copy-pastable example if possible

Unfortunately can't reproduce it with other examples. Only this which I load from csv.
image

txn.loc[pd.Timestamp('2014-06-04 00:00:00'):]

KeyError                                  Traceback (most recent call last)
<ipython-input-35-ce3d2db63a82> in <module>()
----> 1 txn.loc[pd.Timestamp('2014-06-04 00:00:00'):]

/opt/miniconda/lib/python2.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1284             return self._getitem_tuple(key)
   1285         else:
-> 1286             return self._getitem_axis(key, axis=0)
   1287 
   1288     def _getitem_axis(self, key, axis=0):

/opt/miniconda/lib/python2.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1398         if isinstance(key, slice):
   1399             self._has_valid_type(key, axis)
-> 1400             return self._get_slice_axis(key, axis=axis)
   1401         elif is_bool_indexer(key):
   1402             return self._getbool_axis(key, axis=axis)

/opt/miniconda/lib/python2.7/site-packages/pandas/core/indexing.py in _get_slice_axis(self, slice_obj, axis)
   1306         labels = obj._get_axis(axis)
   1307         indexer = labels.slice_indexer(slice_obj.start, slice_obj.stop,
-> 1308                                        slice_obj.step, kind=self.name)
   1309 
   1310         if isinstance(indexer, slice):

/opt/miniconda/lib/python2.7/site-packages/pandas/tseries/index.py in slice_indexer(self, start, end, step, kind)
   1503 
   1504         try:
-> 1505             return Index.slice_indexer(self, start, end, step, kind=kind)
   1506         except KeyError:
   1507             # For historical reasons DatetimeIndex by default supports

/opt/miniconda/lib/python2.7/site-packages/pandas/indexes/base.py in slice_indexer(self, start, end, step, kind)
   2698         """
   2699         start_slice, end_slice = self.slice_locs(start, end, step=step,
-> 2700                                                  kind=kind)
   2701 
   2702         # return a slice

/opt/miniconda/lib/python2.7/site-packages/pandas/indexes/base.py in slice_locs(self, start, end, step, kind)
   2877         start_slice = None
   2878         if start is not None:
-> 2879             start_slice = self.get_slice_bound(start, 'left', kind)
   2880         if start_slice is None:
   2881             start_slice = 0

/opt/miniconda/lib/python2.7/site-packages/pandas/indexes/base.py in get_slice_bound(self, label, side, kind)
   2826             except ValueError:
   2827                 # raise the original KeyError
-> 2828                 raise err
   2829 
   2830         if isinstance(slc, np.ndarray):

KeyError: 1401840000000000000

Expected Output

Slice of everything after '2014-06-04 00:00:00'.

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-77-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.3
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.8
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.30.0
@jreback
Copy link
Contributor

jreback commented May 5, 2016

So this requires montonicity (sortedness). documentation was added here to describe this. Though I think we could have a better error message in this case (still a KeyError, but saying its not monotonic). I don't know why you would get that odd key error. Do you have a repro?

In [7]: dti = pd.date_range('20130101 09:00:00', periods=50, freq='2H')

In [8]: df = DataFrame({'values' : range(len(dti))},index=dti)

In [9]: df
Out[9]: 
                     values
2013-01-01 09:00:00       0
2013-01-01 11:00:00       1
2013-01-01 13:00:00       2
2013-01-01 15:00:00       3
2013-01-01 17:00:00       4
...                     ...
2013-01-05 03:00:00      45
2013-01-05 05:00:00      46
2013-01-05 07:00:00      47
2013-01-05 09:00:00      48
2013-01-05 11:00:00      49

[50 rows x 1 columns]

In [10]: df.loc[pd.Timestamp('20130105 04:00:00'):]
Out[10]: 
                     values
2013-01-05 05:00:00      46
2013-01-05 07:00:00      47
2013-01-05 09:00:00      48
2013-01-05 11:00:00      49

In [11]: df.sample(25)
Out[11]: 
                     values
2013-01-02 07:00:00      11
2013-01-01 13:00:00       2
2013-01-02 05:00:00      10
2013-01-04 03:00:00      33
2013-01-03 07:00:00      23
...                     ...
2013-01-01 15:00:00       3
2013-01-01 23:00:00       7
2013-01-05 11:00:00      49
2013-01-02 13:00:00      14
2013-01-03 03:00:00      21

[25 rows x 1 columns]
In [12]: df.sample(25).loc[pd.Timestamp('20130105 04:00:00'):]
KeyError: Timestamp('2013-01-05 04:00:00')

@jreback jreback added Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Usage Question labels May 5, 2016
@jreback
Copy link
Contributor

jreback commented May 5, 2016

@twiecki
Copy link
Contributor Author

twiecki commented May 5, 2016

That works, thank you! Feel free to close.

@jreback jreback added Difficulty Novice Error Reporting Incorrect or improved errors from pandas and removed Usage Question labels May 5, 2016
@jreback jreback added this to the 0.18.2 milestone May 5, 2016
@shoyer
Copy link
Member

shoyer commented May 5, 2016

We should really be returning an UnsortedIndexError or something like that here instead of the KeyError. Didn't we discuss adding something like that for MultiIndex?

Also, it's unfortunate that we're returning in integer as the error message instead of the original time stamp.

@jreback
Copy link
Contributor

jreback commented May 5, 2016

#12790

and new exception in #11897

@sarwatfatimam
Copy link

sarwatfatimam commented Mar 16, 2017

Hi. I am trying to remove duplicated rows based on time from date column.
df_u.drop_duplicates(df_u['Response Time'], keep='last')
However, I am getting this error:

Traceback (most recent call last):
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\indexes\base.py", line 2134, in get_loc
return self._engine.get_loc(key)
File "pandas\index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
File "pandas\src\hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)
File "pandas\src\hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)
KeyError: Timestamp('2016-05-25 19:09:37')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/Sarwat/Documents/DealSmash/Scripts/DataAnalysis/GCM.py", line 37, in
df_u.drop_duplicates(df_u['Response Time'], keep='last')
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\util\decorators.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3138, in drop_duplicates
duplicated = self.duplicated(subset, keep=keep)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\util\decorators.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3188, in duplicated
labels, shape = map(list, zip(*map(f, vals)))
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3187, in
vals = (self[col].values for col in subset)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2059, in getitem
return self._getitem_column(key)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2066, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1386, in _get_item_cache
values = self._data.get(item)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\core\internals.py", line 3543, in get
loc = self.items.get_loc(item)
File "C:\Users\Sarwat\Anaconda\Anaconda3\lib\site-packages\pandas\indexes\base.py", line 2136, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
File "pandas\src\hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)
File "pandas\src\hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)
KeyError: Timestamp('2016-05-25 19:09:37')
"KeyError: Timestamp('2016-05-25 10:30:00')".

@jorisvandenbossche
Copy link
Member

@sarwatfatimam Please open a separate issue for this (or first ask on gitter or on the mailing list) if you think there is a bug. But, I think you are using drop_duplicates wrong, as you should pass a list of column names instead of the actual columns: df_u.drop_duplicates(['Response Time'], keep='last') (so not df_u['Response Time'])

@fkromer
Copy link

fkromer commented Feb 17, 2021

I've the same issue with pandas 0.25.0 when trying to slice a DataFrame df with monotonically increasing DateTimes as indices which looks like this

                                  1         2
timestamp                                           
2021-02-17 16:07:53.359581+01:00  5.232185  5.214104
2021-02-17 16:07:53.862581+01:00  5.189049  5.202629
2021-02-17 16:07:54.364581+01:00  5.123482  5.131927
2021-02-17 16:07:54.865581+01:00  5.086906  5.085625
2021-02-17 16:07:55.368581+01:00  5.139999  5.080673
...                                    ...       ...
2021-02-17 16:08:50.611581+01:00  4.879005  4.861519
2021-02-17 16:08:51.115581+01:00  4.807243  4.827316
2021-02-17 16:08:52.119581+01:00  4.809059  4.746559
2021-02-17 16:08:52.621581+01:00  4.822960  4.646828
2021-02-17 16:08:52.998581+01:00  0.000000  4.741882

based on DateTime using windowed_dataframe = df[oldest_ts_to_consider:youngest_ts] I get KeyError: Timestamp('2021-02-17 16:07:52.998581+0100', tz='tzoffset(None, 3600)') with this traceback:

  File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 2976, in __getitem__
    indexer = convert_to_index_sliceable(self, key)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexing.py", line 2358, in convert_to_index_sliceable
    return idx._convert_slice_indexer(key, kind="getitem")
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3216, in _convert_slice_indexer
    indexer = self.slice_indexer(start, stop, step, kind=kind)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 1151, in slice_indexer
    return Index.slice_indexer(self, start, end, step, kind=kind)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 5034, in slice_indexer
    start_slice, end_slice = self.slice_locs(start, end, step=step, kind=kind)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 5248, in slice_locs
    start_slice = self.get_slice_bound(start, "left", kind)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 5170, in get_slice_bound
    raise err
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 5164, in get_slice_bound
    slc = self.get_loc(label)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 1039, in get_loc
    return Index.get_loc(self, key, method, tolerance)
    File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 410, in pandas._libs.index.DatetimeEngine.get_loc
  File "pandas/_libs/index.pyx", line 440, in pandas._libs.index.DatetimeEngine.get_loc
KeyError: Timestamp('2021-02-17 16:07:52.998581+0100', tz='tzoffset(None, 3600)')

BTW: One workaround is to convert each col to a series, window each series and combine the series back to a dataframe. With series I had never problems with slicing so far.

@enricorotundo
Copy link
Contributor

Ran into this error with DatetimeIndex and .loc[window_start:window_end]. Solved by .sort_index(). Without @jreback answer, it would have been quite obscure to workaround.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Enhancement Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

10 participants