API: consistency with .ix and .loc for getitem operations (GH8613) #9566

jreback · 2015-02-28T21:01:15Z

In [1]:      df = DataFrame(np.random.randn(5,4), columns=list('ABCD'), index=date_range('20130101',periods=5))

In [2]:      df
Out[2]: 
                   A         B         C         D
2013-01-01 -1.380065  1.221596  0.279703 -0.119258
2013-01-02  1.684202 -0.202251 -0.961145 -1.595015
2013-01-03 -0.623634 -2.377577 -3.024554 -0.298809
2013-01-04 -1.251699  0.456356  1.257002  1.465878
2013-01-05  0.687818 -2.125079  1.454911  1.914207

In [5]:      s = Series(range(5),[-2,-1,1,2,3])

In [6]:      s
Out[6]: 
-2    0
-1    1
 1    2
 2    3
 3    4
dtype: int64

In [7]:      df.loc['2013-01-02':'2013-01-10']
Out[7]: 
                   A         B         C         D
2013-01-02  1.684202 -0.202251 -0.961145 -1.595015
2013-01-03 -0.623634 -2.377577 -3.024554 -0.298809
2013-01-04 -1.251699  0.456356  1.257002  1.465878
2013-01-05  0.687818 -2.125079  1.454911  1.914207

In [9]:      s.loc[-10:3]
Out[9]: 
-2    0
-1    1
 1    2
 2    3
 3    4
dtype: int64

# this used to be a KeyError
In [15]: df.loc[2:3]
TypeError: Cannot index datetime64 with <type 'int'> keys

Slicing with float indexers; now works for ix (loc worked before)

In [3]: s.loc[-1.0:2]
Out[3]: 
-1    1
 1    2
 2    3
dtype: int64

In [4]: s.ix[-1.0:2]
Out[4]: 
-1    1
 1    2
 2    3
dtype: int64

# [] raises
In [5]: s[-1.0:2]
TypeError: cannot do slice start value indexing on <class 'pandas.core.index.Int64Index'> with these indexers [-1.0] of <type 'float'>

jreback · 2015-02-28T22:48:00Z

cc @immerrr
@jorisvandenbossche
@shoyer

I think this should resolve things and make .loc strict, while matching .ix (notably in that .ix tries for positional access when it can, but .loc will refuse to guess).

jorisvandenbossche · 2015-03-01T13:57:02Z

doc/source/whatsnew/v0.16.0.txt

+
+The behavior of a small sub-set of edge cases for indexers have changed (:issue:`8613`):
+
+- to conform the intent of ``.loc`` to allow the same behavior for slice indexing as ``.ix`` when indexers are not found


I would here also explain it in words, something like "slicing where the start and/or stop bound is not found in the index is now allowed (previously gave KeyError), as is the case for ix. The change is only for slicing, not when indecxing with a single label."

shoyer · 2015-03-01T22:35:16Z

pandas/core/indexing.py

@@ -1243,25 +1243,7 @@ def _has_valid_type(self, key, axis):
        # boolean

        if isinstance(key, slice):
-


👍 to deleting lines from core.indexing.

shoyer · 2015-03-01T23:06:05Z

First of all, big thanks for working on this, @jreback!

However, I don't think we need to enforce distinctions between float and int for slice bounds, because there is no ambiguity in the ordering between integers and floats (whereas the ordering with respect to strings, for example, is indeed problematic). Users shouldn't have to check the type of their index (if they know it's numeric) before doing indexing.

In particular:

I think it should be OK to index an Int64Index with floats, like .loc[0.5:2.5] or .loc[-1.0:2].
I think it should be OK to index a Float64Index with integers, like .loc[0:10]. (Of course, these are still to be interpreted as labels, not integer positions)

The binary search logic for determining slice bound locations should already work for both of these cases.

shoyer · 2015-03-01T23:09:57Z

To give a more concrete example from my experience: an index corresponding to a variable like longitude might variously consist of integer (e.g., if it's in multiples of whole degrees) or as floating point numbers (e.g., if it's in multiples of half degrees). It's a bad user experience to need to cast their slice bounds to the type of index. Code like s.loc[min_lon:max_lon] should work regardless of dtypes.

jreback · 2015-03-01T23:10:45Z

@shoyer

I added in these checks, mainly because numpy is deprecating float indexers (IIRC it has a warning now), and we have been warning on it.

Its easy to take out, but I think that indexing with a float on an integer index is really not in the .loc domain. This is moving back to the .ix functionaility (which actually doesn't work in 0.15.2 or atm), though .loc did (which is odd).

though it IS convenient.

currently we don't allow this for [],.ix (but allow for .loc in 0.15.2.

simple example from 0.15.2

In [1]: s = Series(range(5))

In [2]: s[1.5:3]
TypeError: the slice start value [1.5] is not a proper indexer for this index type (Int64Index)

In [3]: s.ix[1.5:3]
TypeError: the slice start value [1.5] is not a proper indexer for this index type (Int64Index)

In [4]: s.loc[1.5:3]
Out[4]: 
2    2
3    3
dtype: int64

These all fail with
TypeError: cannot do slice start value indexing on <class 'pandas.core.index.Int64Index'> with these indexers [1.5] of <type 'float'> in master/0.16.0

jreback · 2015-03-01T23:15:29Z

@jorisvandenbossche what do you think?

shoyer · 2015-03-01T23:18:02Z

I agree that using float indexers is a based idea for position based indexing (.iloc). Position based indexing should always use integers. We should be consistent with numpy for this.

But indexing with .loc uses labels. Here (e.g., with searchsorted), numpy is more flexible -- the important thing is that there is a well defined sort order.

jreback · 2015-03-01T23:22:38Z

this should then also work for .ix/[]?

shoyer · 2015-03-01T23:22:49Z

To elaborate:

There is never a good reason to use floats for position based indexing. Numpy warns and/or raises because using a float is an indication that your code probably has a bug.

In contrast, there certainly are good reasons to use floats for label based indexing, because the labels might include floats themselves (or might not, depending on dtypes).

shoyer · 2015-03-01T23:28:18Z

.ix and [] are an unpredictable mess! We hadn't even written down the rules for exactly how indexing with [] works last time I checked.

If I had my way, we would make some hard choices and deprecate/remove fall-back indexing all together (#9213). In the mean-time, I suspect the best approach is to simply try to break/change as little existing code as possible.

jreback · 2015-03-01T23:56:36Z

ok, here is the change (plus making some testing more robust)

In [3]:  index = tm.makeIntIndex()
In [4]:  s = Series(np.arange(len(index))+10,index+5)
In [5]: s
Out[5]: 
5     10
6     11
7     12
8     13
9     14
10    15
11    16
12    17
13    18
14    19
dtype: int64

In [6]: s.loc[6.0:8.5]
Out[6]: 
6    11
7    12
8    13
dtype: int64

In [7]: s.loc[5.5:8.5]
Out[7]: 
6    11
7    12
8    13
dtype: int64

In [8]: s.loc[5.5:8.0]
Out[8]: 
6    11
7    12
8    13
dtype: int64

I could enable for []/ix, but then would s[5.0:8.0] become a fallback scenario (yes I think)
so what would s[5.5,8.5] do (non-fallback I think), so they become very sensitive to the actual values, hence bad

jorisvandenbossche · 2015-03-02T09:35:06Z

I agree with the logic of allowing float indexers for integer indexes when using loc (so as you now implemented)

Ideally, I think it should also work for ix (to keep ix as much as possible to the logic of try: s.loc[]; except: s.iloc[]. In principle , we say that for an integer index axis, we do never integer location fallback (http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing), so following the loc behaviour should not be ambiguous in this case I think.

For [], it is something else. For that, we should first have a better documented overview of it behaviour (I started with that last week, will try to put it up as a PR)

jreback · 2015-03-02T11:45:39Z

ok, I added the .ix support for float indexers on an Int index. It is trivial to make this work for []. I am thinking that this should have the same semantics as .ix (e.g. label based with an int index). It works the same.

jorisvandenbossche · 2015-03-02T12:09:20Z

But for [] slicing is 'integer-location' based (an exception on the other types of slicing), so this should follow the semantics of iloc in this case, disallowing floats.

jorisvandenbossche · 2015-03-02T12:10:32Z

Example:

In [46]: s = Series(range(4), index=[-1,0,1,2])

In [47]: s
Out[47]:
-1    0
 0    1
 1    2
 2    3
dtype: int64

In [48]: s[0:2]    # <--- slicing = integer location based
Out[48]:
-1    0
 0    1
dtype: int64

In [49]: s[0]    # <--- single label = label based
Out[49]: 1

jreback · 2015-03-02T23:06:43Z

ok, is updated, any more comments
@jorisvandenbossche @shoyer

shoyer · 2015-03-03T08:24:46Z

pandas/core/index.py

        Notes
        -----
        Value of `side` parameter should be validated in caller.

        """
+
+        # pass thru float indexers if we have a numeric type index


This logic here seems OK, but a little confusing in context. To me it would make more sense to put it on the subclass _maybe_cast_slice_bound methods, even if that means a little more duplication. Otherwise, things will break, e.g., as soon as we add IntervalIndex.

shoyer · 2015-03-03T08:28:05Z

Generally seems fine by me.

raise TypeError rather than KeyError on invalid scalar/slice indexing with that index type

jreback · 2015-03-04T11:53:22Z

indexing is such a rabbit hole :)

I cleaned up _convert_slice_indexer a bit. I tried to use _maybe_cast_slice_bound directly there (I added the kind argument which is the type of indexing, e.g. ix/loc/iloc/getitem). But so many cases are handled that its really hard to get this right. Because the different kinds are handled in different places

its tricky because you have the kinds of indexers (ix/loc/iloc/getitem) x index_types (Index,Float,Int,Datetime....) x kinds of indexing (slice/scalar)

So its better (meaning slightly cleaner), with the semantics we discussed above.

rename typ -> kind for _convert_*_indexer add kind argument to _maybe_cast_slice_bound cleaned up _convert_slice_indexer a bit

API: consistency with .ix and .loc for getitem operations (GH8613)

jorisvandenbossche · 2015-03-04T20:51:01Z

Thanks a lot for this!

shoyer · 2015-03-04T20:53:50Z

Indeed, greatly appreciated!

immerrr · 2015-03-13T17:06:32Z

That's great that you guys (and especially @jreback) actually went through this 👍

jreback · 2015-03-13T17:47:35Z

trying to make indexing slightly less of a rabbit hole :)

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves API Design labels Feb 28, 2015

jreback added this to the 0.16.0 milestone Feb 28, 2015

jreback force-pushed the loc branch 4 times, most recently from f5b9f7e to d61b9b5 Compare February 28, 2015 22:42

jreback force-pushed the loc branch from d61b9b5 to baafd87 Compare February 28, 2015 22:49

jorisvandenbossche reviewed Mar 1, 2015
View reviewed changes

jreback force-pushed the loc branch 2 times, most recently from 1141055 to 2c2cf61 Compare March 1, 2015 16:06

shoyer reviewed Mar 1, 2015
View reviewed changes

jreback force-pushed the loc branch from 2c2cf61 to 1ba11a9 Compare March 1, 2015 23:06

jreback force-pushed the loc branch from 3991367 to 4f76199 Compare March 1, 2015 23:53

jreback force-pushed the loc branch from 4f76199 to af4d233 Compare March 2, 2015 11:41

shoyer reviewed Mar 3, 2015
View reviewed changes

jreback force-pushed the loc branch from af4d233 to de37741 Compare March 3, 2015 11:29

jreback added 4 commits March 4, 2015 06:46

API: consistency with .ix and .loc for getitem operations (GH8613)

d6774a7

raise TypeError rather than KeyError on invalid scalar/slice indexing with that index type

DOC: reorg / update v0.16.0.txt docs

dd1ff3e

API: enable float slicing with .loc on int index

a5801f6

API: enable float slicing with .ix

560fa87

jreback force-pushed the loc branch from de37741 to 3f9708f Compare March 4, 2015 11:48

clean up semantics on slice_indexer a bit

9207145

rename typ -> kind for _convert_*_indexer add kind argument to _maybe_cast_slice_bound cleaned up _convert_slice_indexer a bit

jreback force-pushed the loc branch from 3f9708f to 9207145 Compare March 4, 2015 12:34

jreback added a commit that referenced this pull request Mar 4, 2015

Merge pull request #9566 from jreback/loc

8992129

API: consistency with .ix and .loc for getitem operations (GH8613)

jreback merged commit 8992129 into pandas-dev:master Mar 4, 2015

shoyer mentioned this pull request Feb 20, 2016

API: behaviour of label indexing with floats on integer index #12333

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: consistency with .ix and .loc for getitem operations (GH8613) #9566

API: consistency with .ix and .loc for getitem operations (GH8613) #9566

jreback commented Feb 28, 2015

jreback commented Feb 28, 2015

jorisvandenbossche Mar 1, 2015

jreback Mar 1, 2015

shoyer Mar 1, 2015

jreback Mar 1, 2015

shoyer commented Mar 1, 2015

shoyer commented Mar 1, 2015

jreback commented Mar 1, 2015

jreback commented Mar 1, 2015

shoyer commented Mar 1, 2015

jreback commented Mar 1, 2015

shoyer commented Mar 1, 2015

shoyer commented Mar 1, 2015

jreback commented Mar 1, 2015

jorisvandenbossche commented Mar 2, 2015

jreback commented Mar 2, 2015

jorisvandenbossche commented Mar 2, 2015

jorisvandenbossche commented Mar 2, 2015

jreback commented Mar 2, 2015

shoyer Mar 3, 2015

shoyer commented Mar 3, 2015

jreback commented Mar 4, 2015

jorisvandenbossche commented Mar 4, 2015

shoyer commented Mar 4, 2015

immerrr commented Mar 13, 2015

jreback commented Mar 13, 2015


		The behavior of a small sub-set of edge cases for indexers have changed (:issue:`8613`):

		- to conform the intent of ``.loc`` to allow the same behavior for slice indexing as ``.ix`` when indexers are not found

		@@ -1243,25 +1243,7 @@ def _has_valid_type(self, key, axis):
		# boolean

		if isinstance(key, slice):

API: consistency with .ix and .loc for getitem operations (GH8613) #9566

API: consistency with .ix and .loc for getitem operations (GH8613) #9566

Conversation

jreback commented Feb 28, 2015

jreback commented Feb 28, 2015

jorisvandenbossche Mar 1, 2015

Choose a reason for hiding this comment

jreback Mar 1, 2015

Choose a reason for hiding this comment

shoyer Mar 1, 2015

Choose a reason for hiding this comment

jreback Mar 1, 2015

Choose a reason for hiding this comment

shoyer commented Mar 1, 2015

shoyer commented Mar 1, 2015

jreback commented Mar 1, 2015

jreback commented Mar 1, 2015

shoyer commented Mar 1, 2015

jreback commented Mar 1, 2015

shoyer commented Mar 1, 2015

shoyer commented Mar 1, 2015

jreback commented Mar 1, 2015

jorisvandenbossche commented Mar 2, 2015

jreback commented Mar 2, 2015

jorisvandenbossche commented Mar 2, 2015

jorisvandenbossche commented Mar 2, 2015

jreback commented Mar 2, 2015

shoyer Mar 3, 2015

Choose a reason for hiding this comment

shoyer commented Mar 3, 2015

jreback commented Mar 4, 2015

jorisvandenbossche commented Mar 4, 2015

shoyer commented Mar 4, 2015

immerrr commented Mar 13, 2015

jreback commented Mar 13, 2015