Skip to content

Slicing non-unique index fails when slice endpoint is not in index #7523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dbew opened this issue Jun 20, 2014 · 5 comments · Fixed by #7525
Closed

Slicing non-unique index fails when slice endpoint is not in index #7523

dbew opened this issue Jun 20, 2014 · 5 comments · Fixed by #7525
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@dbew
Copy link
Contributor

dbew commented Jun 20, 2014

This looks like a regression from 0.13.1. If you have a simple dataframe with a non-unique but sorted index like:

df = pd.DataFrame(np.arange(5.), index=[datetime(2001, 1, i, 10, 00)
                                        for i in [1,2,2,3,4]])
df
Out[23]: 
                     0
2001-01-01 10:00:00  0
2001-01-02 10:00:00  1
2001-01-02 10:00:00  2
2001-01-03 10:00:00  3
2001-01-04 10:00:00  4

[5 rows x 1 columns]

Then in pandas 0.13.1 you could slice a range out of that dataframe with a date not in the index like so:

# pandas 0.13.1
df.ix[datetime(2001, 1, 2, 11, 0):]
Out[22]: 
                     0
2001-01-03 10:00:00  3
2001-01-04 10:00:00  4

[2 rows x 1 columns]

In the HEAD of pandas/master now, you can't. Instead you get a TypeError with long long traceback.

# pandas 0.14.1
In [58]: df.ix[datetime(2001, 1, 2, 11, 0):]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-58-c4fe729e46fe> in <module>()
----> 1 df.ix[datetime(2001, 1, 2, 11, 0):]

/users/is/dbew/src/pandas/pandas/core/indexing.py in __getitem__(self, key)
     69             return self._getitem_tuple(key)
     70         else:
---> 71             return self._getitem_axis(key, axis=0)
     72 
     73     def _get_label(self, label, axis=0):

/users/is/dbew/src/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis, validate_iterable)
    843         labels = self.obj._get_axis(axis)
    844         if isinstance(key, slice):
--> 845             return self._get_slice_axis(key, axis=axis)
    846         elif _is_list_like(key) and not (isinstance(key, tuple) and
    847                                          isinstance(labels, MultiIndex)):

/users/is/dbew/src/pandas/pandas/core/indexing.py in _get_slice_axis(self, slice_obj, axis)
   1090         if not _need_slice(slice_obj):
   1091             return obj
-> 1092         indexer = self._convert_slice_indexer(slice_obj, axis)
   1093 
   1094         if isinstance(indexer, slice):

/users/is/dbew/src/pandas/pandas/core/indexing.py in _convert_slice_indexer(self, key, axis)
    161         # if we are accessing via lowered dim, use the last dim
    162         ax = self.obj._get_axis(min(axis, self.ndim - 1))
--> 163         return ax._convert_slice_indexer(key, typ=self.name)
    164 
    165     def _has_valid_setitem_indexer(self, indexer):

/users/is/dbew/src/pandas/pandas/core/index.py in _convert_slice_indexer(self, key, typ)
    589         else:
    590             try:
--> 591                 indexer = self.slice_indexer(start, stop, step)
    592             except Exception:
    593                 if is_index_slice:

/users/is/dbew/src/pandas/pandas/tseries/index.py in slice_indexer(self, start, end, step)
   1350             raise TypeError('Cannot index datetime64 with float keys')
   1351 
-> 1352         return Index.slice_indexer(self, start, end, step)
   1353 
   1354     def slice_locs(self, start=None, end=None):

/users/is/dbew/src/pandas/pandas/core/index.py in slice_indexer(self, start, end, step)
   1712 
   1713         # loc indexers
-> 1714         return Index(start_slice) & Index(end_slice)
   1715 
   1716     def slice_locs(self, start=None, end=None):

/users/is/dbew/src/pandas/pandas/core/index.py in __new__(cls, data, dtype, copy, name, fastpath, tupleize_cols, **kwargs)
    170                         pass  # python2 - MultiIndex fails on mixed types
    171             # other iterable of some kind
--> 172             subarr = com._asarray_tuplesafe(data, dtype=object)
    173 
    174         if dtype is None:

/users/is/dbew/src/pandas/pandas/core/common.py in _asarray_tuplesafe(values, dtype)
   2106     if not (isinstance(values, (list, tuple))
   2107             or hasattr(values, '__array__')):
-> 2108         values = list(values)
   2109     elif isinstance(values, Index):
   2110         return values.values

TypeError: 'datetime.datetime' object is not iterable
@dbew
Copy link
Contributor Author

dbew commented Jun 20, 2014

Looks like this is specific to DatetimeIndex - I can't see the same behaviour with a simple int index.

@jreback
Copy link
Contributor

jreback commented Jun 20, 2014

was an untested case and the refactoring of the BlockManager causes a small issue, fixed in #7525

cc @immerrr

@immerrr
Copy link
Contributor

immerrr commented Jun 20, 2014

Hmm, have I refactored out some part of that logic that was engrained somewhere inside BlockManager?

@jreback
Copy link
Contributor

jreback commented Jun 20, 2014

no, I don't think so (though I though you changed this part of index.py). in any event works now.

@dbew
Copy link
Contributor Author

dbew commented Jun 20, 2014

Wow, fast response. Thanks for looking into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Projects
None yet
3 participants