Skip to content

Commit d6774a7

Browse files
committed
API: consistency with .ix and .loc for getitem operations (GH8613)
raise TypeError rather than KeyError on invalid scalar/slice indexing with that index type
1 parent 85703a7 commit d6774a7

File tree

11 files changed

+316
-97
lines changed

11 files changed

+316
-97
lines changed

doc/source/indexing.rst

+30-3
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ of multi-axis indexing.
8585

8686
- ``.iloc`` is primarily integer position based (from ``0`` to
8787
``length-1`` of the axis), but may also be used with a boolean
88-
array. ``.iloc`` will raise ``IndexError`` if a requested
88+
array. ``.iloc`` will raise ``IndexError`` if a requested
8989
indexer is out-of-bounds, except *slice* indexers which allow
9090
out-of-bounds indexing. (this conforms with python/numpy *slice*
9191
semantics). Allowed inputs are:
@@ -292,6 +292,35 @@ Selection By Label
292292
This is sometimes called ``chained assignment`` and should be avoided.
293293
See :ref:`Returning a View versus Copy <indexing.view_versus_copy>`
294294

295+
.. warning::
296+
297+
``.loc`` is strict when you present slicers that are not compatible (or convertible) with the index type. For example
298+
using integers in a ``DatetimeIndex`` or float indexers in an ``Int64Index``. These will raise a ``TypeError``.
299+
300+
.. ipython:: python
301+
302+
dfl = DataFrame(np.random.randn(5,4), columns=list('ABCD'), index=date_range('20130101',periods=5))
303+
dfl
304+
sl = Series(range(5),[-2,-1,1,2,3])
305+
sl
306+
307+
.. code-block:: python
308+
309+
In [4]: dfl.loc[2:3]
310+
TypeError: cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [2] of <type 'int'>
311+
312+
.. code-block:: python
313+
314+
In [8]: sl.loc[-1.0:2]
315+
TypeError: cannot do slice indexing on <class 'pandas.core.index.Int64Index'> with these indexers [-1.0] of <type 'float'>
316+
317+
318+
String likes in slicing *can* be convertible to the type of the index and lead to natural slicing.
319+
320+
.. ipython:: python
321+
322+
dfl.loc['20130102':'20130104']
323+
295324
pandas provides a suite of methods in order to have **purely label based indexing**. This is a strict inclusion based protocol.
296325
**at least 1** of the labels for which you ask, must be in the index or a ``KeyError`` will be raised! When slicing, the start bound is *included*, **AND** the stop bound is *included*. Integers are valid labels, but they refer to the label **and not the position**.
297326

@@ -1486,5 +1515,3 @@ This will **not** work at all, and so should be avoided
14861515
The chained assignment warnings / exceptions are aiming to inform the user of a possibly invalid
14871516
assignment. There may be false positives; situations where a chained assignment is inadvertantly
14881517
reported.
1489-
1490-

doc/source/whatsnew/v0.16.0.txt

+60
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,66 @@ Backwards incompatible API changes
211211
p // 0
212212

213213

214+
Indexing Changes
215+
~~~~~~~~~~~~~~~~
216+
217+
.. _whatsnew_0160.api_breaking.indexing:
218+
219+
The behavior of a small sub-set of edge cases for using ``.loc`` have changed (:issue:`8613`). Furthermore we have improved the content of the error messages that are raised:
220+
221+
- slicing with ``.loc`` where the start and/or stop bound is not found in the index is now allowed; this previously would raise a ``KeyError``. This makes the behavior the same as ``.ix`` in this case. This change is only for slicing, not when indexing with a single label.
222+
223+
.. ipython:: python
224+
225+
df = DataFrame(np.random.randn(5,4), columns=list('ABCD'), index=date_range('20130101',periods=5))
226+
df
227+
s = Series(range(5),[-2,-1,1,2,3])
228+
s
229+
230+
Previous Behavior
231+
232+
.. code-block:: python
233+
234+
In [4]: df.loc['2013-01-02':'2013-01-10']
235+
KeyError: 'stop bound [2013-01-10] is not in the [index]'
236+
237+
In [6]: s.loc[-10:3]
238+
KeyError: 'start bound [-10] is not the [index]'
239+
240+
In [8]: s.loc[-1.0:2]
241+
Out[2]:
242+
-1 1
243+
1 2
244+
2 3
245+
dtype: int64
246+
247+
New Behavior
248+
249+
.. ipython:: python
250+
251+
df.loc['2013-01-02':'2013-01-10']
252+
s.loc[-10:3]
253+
254+
.. code-block:: python
255+
256+
In [8]: s.loc[-1.0:2]
257+
TypeError: cannot do slice indexing on <class 'pandas.core.index.Int64Index'> with these indexers [-1.0] of <type 'float'>
258+
259+
- provide a useful exception for indexing with an invalid type for that index when using ``.loc``. For example trying to use ``.loc`` on an index of type ``DatetimeIndex`` or ``PeriodIndex`` or ``TimedeltaIndex``, with an integer (or a float).
260+
261+
Previous Behavior
262+
263+
.. code-block:: python
264+
265+
In [4]: df.loc[2:3]
266+
KeyError: 'start bound [2] is not the [index]'
267+
268+
New Behavior
269+
270+
.. code-block:: python
271+
272+
In [4]: df.loc[2:3]
273+
TypeError: Cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'> with <type 'int'> keys
214274

215275
Deprecations
216276
~~~~~~~~~~~~

pandas/core/index.py

+54-9
Original file line numberDiff line numberDiff line change
@@ -640,7 +640,7 @@ def _convert_scalar_indexer(self, key, typ=None):
640640
def to_int():
641641
ikey = int(key)
642642
if ikey != key:
643-
return self._convert_indexer_error(key, 'label')
643+
return self._invalid_indexer('label', key)
644644
return ikey
645645

646646
if typ == 'iloc':
@@ -651,7 +651,7 @@ def to_int():
651651
warnings.warn("scalar indexers for index type {0} should be integers and not floating point".format(
652652
type(self).__name__),FutureWarning)
653653
return key
654-
return self._convert_indexer_error(key, 'label')
654+
return self._invalid_indexer('label', key)
655655

656656
if is_float(key):
657657
if not self.is_floating():
@@ -667,7 +667,7 @@ def _validate_slicer(self, key, f):
667667

668668
for c in ['start','stop','step']:
669669
if not f(getattr(key,c)):
670-
self._convert_indexer_error(key.start, 'slice {0} value'.format(c))
670+
self._invalid_indexer('slice {0} value'.format(c), key.start)
671671

672672
def _convert_slice_indexer_getitem(self, key, is_index_slice=False):
673673
""" called from the getitem slicers, determine how to treat the key
@@ -698,7 +698,7 @@ def f(c):
698698
"and not floating point",FutureWarning)
699699
return int(v)
700700

701-
self._convert_indexer_error(v, 'slice {0} value'.format(c))
701+
self._invalid_indexer('slice {0} value'.format(c), v)
702702

703703
return slice(*[ f(c) for c in ['start','stop','step']])
704704

@@ -787,11 +787,13 @@ def _convert_list_indexer_for_mixed(self, keyarr, typ=None):
787787

788788
return None
789789

790-
def _convert_indexer_error(self, key, msg=None):
791-
if msg is None:
792-
msg = 'label'
793-
raise TypeError("the {0} [{1}] is not a proper indexer for this index "
794-
"type ({2})".format(msg, key, self.__class__.__name__))
790+
def _invalid_indexer(self, form, key):
791+
""" consistent invalid indexer message """
792+
raise TypeError("cannot do {form} indexing on {klass} with these "
793+
"indexers [{key}] of {typ}".format(form=form,
794+
klass=type(self),
795+
key=key,
796+
typ=type(key)))
795797

796798
def get_duplicates(self):
797799
from collections import defaultdict
@@ -2119,11 +2121,27 @@ def _maybe_cast_slice_bound(self, label, side):
21192121
label : object
21202122
side : {'left', 'right'}
21212123
2124+
Returns
2125+
-------
2126+
label : object
2127+
21222128
Notes
21232129
-----
21242130
Value of `side` parameter should be validated in caller.
21252131
21262132
"""
2133+
2134+
# pass thru float indexers if we have a numeric type index
2135+
# which then can decide to process / or convert and warng
2136+
if is_float(label):
2137+
if not self.is_floating():
2138+
self._invalid_indexer('slice',label)
2139+
2140+
# we are not an integer based index, and we have an integer label
2141+
# treat as positional based slicing semantics
2142+
if not self.is_integer() and is_integer(label):
2143+
self._invalid_indexer('slice',label)
2144+
21272145
return label
21282146

21292147
def _searchsorted_monotonic(self, label, side='left'):
@@ -2158,10 +2176,12 @@ def get_slice_bound(self, label, side):
21582176
" must be either 'left' or 'right': %s" % (side,))
21592177

21602178
original_label = label
2179+
21612180
# For datetime indices label may be a string that has to be converted
21622181
# to datetime boundary according to its resolution.
21632182
label = self._maybe_cast_slice_bound(label, side)
21642183

2184+
# we need to look up the label
21652185
try:
21662186
slc = self.get_loc(label)
21672187
except KeyError as err:
@@ -2654,6 +2674,31 @@ def astype(self, dtype):
26542674
self.__class__)
26552675
return Index(self.values, name=self.name, dtype=dtype)
26562676

2677+
def _maybe_cast_slice_bound(self, label, side):
2678+
"""
2679+
This function should be overloaded in subclasses that allow non-trivial
2680+
casting on label-slice bounds, e.g. datetime-like indices allowing
2681+
strings containing formatted datetimes.
2682+
2683+
Parameters
2684+
----------
2685+
label : object
2686+
side : {'left', 'right'}
2687+
2688+
Returns
2689+
-------
2690+
label : object
2691+
2692+
Notes
2693+
-----
2694+
Value of `side` parameter should be validated in caller.
2695+
2696+
"""
2697+
if not (is_integer(label) or is_float(label)):
2698+
self._invalid_indexer('slice',label)
2699+
2700+
return label
2701+
26572702
def _convert_scalar_indexer(self, key, typ=None):
26582703
if typ == 'iloc':
26592704
return super(Float64Index, self)._convert_scalar_indexer(key,

pandas/core/indexing.py

+1-19
Original file line numberDiff line numberDiff line change
@@ -1243,25 +1243,7 @@ def _has_valid_type(self, key, axis):
12431243
# boolean
12441244

12451245
if isinstance(key, slice):
1246-
1247-
if ax.is_floating():
1248-
1249-
# allowing keys to be slicers with no fallback
1250-
pass
1251-
1252-
else:
1253-
if key.start is not None:
1254-
if key.start not in ax:
1255-
raise KeyError(
1256-
"start bound [%s] is not the [%s]" %
1257-
(key.start, self.obj._get_axis_name(axis))
1258-
)
1259-
if key.stop is not None:
1260-
if key.stop not in ax:
1261-
raise KeyError(
1262-
"stop bound [%s] is not in the [%s]" %
1263-
(key.stop, self.obj._get_axis_name(axis))
1264-
)
1246+
return True
12651247

12661248
elif is_bool_indexer(key):
12671249
return True

pandas/core/series.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -536,7 +536,7 @@ def __getitem__(self, key):
536536
else:
537537

538538
# we can try to coerce the indexer (or this will raise)
539-
new_key = self.index._convert_scalar_indexer(key)
539+
new_key = self.index._convert_scalar_indexer(key,typ='getitem')
540540
if type(new_key) != type(key):
541541
return self.__getitem__(new_key)
542542
raise

pandas/tests/test_index.py

+18-4
Original file line numberDiff line numberDiff line change
@@ -950,16 +950,30 @@ def test_slice_locs(self):
950950
self.assertEqual(idx.slice_locs(start=3), (3, n))
951951
self.assertEqual(idx.slice_locs(3, 8), (3, 6))
952952
self.assertEqual(idx.slice_locs(5, 10), (3, n))
953-
self.assertEqual(idx.slice_locs(5.0, 10.0), (3, n))
954-
self.assertEqual(idx.slice_locs(4.5, 10.5), (3, 8))
955953
self.assertEqual(idx.slice_locs(end=8), (0, 6))
956954
self.assertEqual(idx.slice_locs(end=9), (0, 7))
957955

956+
# reversed
958957
idx2 = idx[::-1]
959958
self.assertEqual(idx2.slice_locs(8, 2), (2, 6))
960-
self.assertEqual(idx2.slice_locs(8.5, 1.5), (2, 6))
961959
self.assertEqual(idx2.slice_locs(7, 3), (2, 5))
962-
self.assertEqual(idx2.slice_locs(10.5, -1), (0, n))
960+
961+
# float slicing
962+
idx = Index(np.array([0, 1, 2, 5, 6, 7, 9, 10], dtype=float))
963+
n = len(idx)
964+
self.assertEqual(idx.slice_locs(5.0, 10.0), (3, n))
965+
self.assertEqual(idx.slice_locs(4.5, 10.5), (3, 8))
966+
idx2 = idx[::-1]
967+
self.assertEqual(idx2.slice_locs(8.5, 1.5), (2, 6))
968+
self.assertEqual(idx2.slice_locs(10.5, -1), (0, n))
969+
970+
# int slicing with floats
971+
idx = Index(np.array([0, 1, 2, 5, 6, 7, 9, 10], dtype=int))
972+
self.assertRaises(TypeError, lambda : idx.slice_locs(5.0, 10.0))
973+
self.assertRaises(TypeError, lambda : idx.slice_locs(4.5, 10.5))
974+
idx2 = idx[::-1]
975+
self.assertRaises(TypeError, lambda : idx2.slice_locs(8.5, 1.5))
976+
self.assertRaises(TypeError, lambda : idx2.slice_locs(10.5, -1))
963977

964978
def test_slice_locs_dup(self):
965979
idx = Index(['a', 'a', 'b', 'c', 'd', 'd'])

0 commit comments

Comments
 (0)