Skip to content

API: label-based slicing with not-included labels #8613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Oct 23, 2014 · 40 comments · Fixed by #9566
Closed

API: label-based slicing with not-included labels #8613

jorisvandenbossche opened this issue Oct 23, 2014 · 40 comments · Fixed by #9566
Labels
Bug Docs Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@jorisvandenbossche
Copy link
Member

I didn't directly find an issue about it, or an explanation in the docs, but I stumbled today on the following, which did surprise me a bit:

Considering the following dataframe:

In [18]: df = pd.DataFrame(np.random.randn(5,2), index=pd.date_range('2012-01-01', periods=5))

In [19]: df
Out[19]:
                   0         1
2012-01-01  2.511337 -0.776326
2012-01-02  0.133589  0.441911
2012-01-03  0.348167  1.285188
2012-01-04  1.075843  1.282131
2012-01-05  0.683006  0.558459

Slicing with a label that is not included in the index works with .ix, but not with .loc:

In [20]: df.ix['2012-01-03':'2012-01-31']
Out[20]:
                   0         1
2012-01-03  0.348167  1.285188
2012-01-04  1.075843  1.282131
2012-01-05  0.683006  0.558459

In [21]: df.loc['2012-01-03':'2012-01-31']
...
KeyError: 'stop bound [2012-01-31] is not in the [index]'

Context: I was updating some older code, and I wanted to replace .ix with .loc (as this is what we recommend if it is purely label based to prevent confusion).

Some things:

  • If this is intended, I don't find this stated somewhere in the docs. So the docs are at least lacking at this point.
  • the inconsistency between [], .ix[] and .loc[] is a bit surprising here
  • it is also inconsistent with iloc -> that behaviour was changed in 0.14 to allow out of bound slicing (http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0140-api)
  • Specifically for datetime-line indexing, it is also inconsistent with the feature of partial string indexing: df.loc['2012-01-03':'2012-01'] will work and do the expected while df.loc['2012-01-03':'2012-01-31'] fails
@jorisvandenbossche
Copy link
Member Author

Ah, seems there was a (closed) issue related to this: #5223

@jorisvandenbossche
Copy link
Member Author

Also, and this was the actual use case, how would be the following be best done?

In [32]: df = pd.DataFrame(np.random.randn(60,3), index=pd.date_range('2012-01-01', periods=60), columns=['a', 'b', 'c'])
In [33]: df.iloc[25:35,1] = np.nan

In [34]: df = df.dropna()

So you have a dataframe, and for some reason there are some NaNs, and these data have to be removed. With the resulting dataframe, I now want to select all data up to the end of January for columns a and d. So my code did:

In [36]: df.ix[:'2012-01-31', ['a', 'b']]
Out[36]:
                   a         b
2012-01-01  2.988938 -0.818541
2012-01-02  0.068606  0.203595
2012-01-03  0.748376  0.397277
2012-01-04  1.577034 -1.061293
...
2012-01-22  0.079241  0.330411
2012-01-23 -0.799368 -1.815083
2012-01-24 -0.177154  0.329965
2012-01-25  0.391620  1.613779

as I don't know beforehand which indexes will be missing. I wanted to update this to use loc, which didn't work. But it seems strange that you have to use ix for a case like this.

@immerrr
Copy link
Contributor

immerrr commented Oct 23, 2014

IMO this should work with loc given the corresponding axis is monotonic.

@jorisvandenbossche
Copy link
Member Author

@immerrr Well, that is indeed the 'logic' for ix: this slicing with non-included labels only works for monotic indices (and this was explicitly introduced, see c3708f2).
I don't really see a reason why this would be different for loc, and I am wondering if this was an oversight or on-purpose when .loc was introduced (and it could easily be an oversight, as this behaviour of ix is also not really documented, apart from the whatsnew of 0.7, and a single title (without further text) in the gotcha section: http://pandas.pydata.org/pandas-docs/stable/gotchas.html#non-monotonic-indexes-require-exact-matches)

@jreback
Copy link
Contributor

jreback commented Oct 23, 2014

I think this is a bug. .ix does much less validation, so prob just slipping thru (I also didn't touch it much), was trying to preserve its exact functionaility.

That said, we should document the missing values handling (for scalar/slice) a bit more

@jreback jreback added Bug Good as first PR Indexing Related to indexing on series/frames, not to indexes themselves Docs and removed Good as first PR labels Oct 23, 2014
@jreback jreback added this to the 0.15.1 milestone Oct 23, 2014
@jorisvandenbossche
Copy link
Member Author

@jreback What do you mean with a 'bug'? That loc should also have the same behaviour as ix in this case? (so not 'intended' as you said in #5223)

@jreback
Copy link
Contributor

jreback commented Oct 23, 2014

I shouldn't say bug, rather unintended non-compliance with .ix behavior (that we didn't mean to change)

@shoyer
Copy link
Member

shoyer commented Nov 6, 2014

In #8740, I noted that this is already inconsistent between float and int indexes (float indexes don't check bounds).

In addition to consistency considerations, there may also be an efficiency argument. For float and interval indexes, you can't check whether a number is within the index bounds without doing binary search. This essentially doubles the amount of work necessary for doing slice lookups.

@shoyer
Copy link
Member

shoyer commented Nov 6, 2014

@jreback It sounds like you are concerned about how weird it would be to get back an empty Series if one or both of the indexers are out of bounds. But in fact, this is exactly what Python (and numpy) already does when indexing a list/ndarray with out of bound integers:

>>> range(5)[10:]
[]

Based on the precedence from Python, I would only raise an exception if one of the slice bounds has the wrong type to lookup its order in the index -- but I suspect this case is already handled in slice_locs.

@immerrr
Copy link
Contributor

immerrr commented Nov 6, 2014

Speaking of unintended non-compliance, it seems that #7525 "fixed" this issue for ix.

I would only raise an exception if one of the slice bounds has the wrong type to lookup its order in the index

If the index is non-monotonic, out-of-bound label lookup should raise.

Also, existing, but non-unique bound that doesn't occupy contiguous slots of storage should raise as a slice bound (think s.loc['foo':] with s.index = ['foo', 'bar', 'foo']).

There's also a rather shady case of looking up dates with string literals, e.g. you are allowed to do s.loc['2014-01-01'] and it works as expected most of the time. But at least at some point in the past what pandas did under the hood was something like s.loc[datetime(2014, 1, 1): datetime(2014, 1, 1, 23, 59, 59, 999999)]. And if it is still the case, it is not obvious if doing s.loc['2014-01-01':] should work in case of non-monotonic index when such string-based lookup refers to a single element.

@shoyer
Copy link
Member

shoyer commented Nov 6, 2014

@immerrr Agreed about all those cases. But again, I'm pretty sure all of those are already handled in slice_locs. The out of bounds slices bounds check is extra, and it's done in _LocIndexer (sp?). So it's also poor separation of concerns.

@immerrr
Copy link
Contributor

immerrr commented Nov 6, 2014

Yup, I don't like this either.

As a matter of fact I was thinking about redesigning slice_locs recently: most of the work is done in internal _get_slice function anyway, so I think it'll be better to change that function to a self-sufficient get_bound(label, upper_or_lower) method to incorporate all label-related logic and just spit out a location. Other methods could just take that and apply +1/-1 offsets as necessary for python-based slicing to work with different step directions. Negative steps, for example aren't handled in the existing code (#8716).

@jorisvandenbossche
Copy link
Member Author

Another thing: "out-of-bound" is at the moment not allowed, but hte index does not have to be in the index:

In [14]: s = pd.Series(['a','b','c','d'], [1,2,3,4])

In [17]: s.loc[1.5:3.5]
Out[17]:
2    b
3    c
dtype: object

In [18]: s.loc[1.5:5.5]
...
KeyError: 'stop bound [5.5] is not in the [index]'

So the error message is also confusing, as the reason is not really that the key is not found in the index, but that it lies outside the range of the index. Also a bit confusing and inconsistent I think.

@jreback
Copy link
Contributor

jreback commented Nov 6, 2014

you guys seem to be missing the point. Various index types DO handle this properly. Datetimelike handle out-of-bounds label (string) based slicing to enable partial indexing (e.g. using '2014-01'). While when specifying a Timestamp they must be exactly in the index.

Floats are exactly the same (the above example has an Int64Index, so float slicing does not apply).

Int64Index by definition NEVER has label sematics.

And an object index, CANNOT ever have out-of-bounds slicing, it is never a monotonic index type, e.g.
its either in the index or not (though my example directly contracts my statement).

In [3]: s = Series([1,2,3,4],Index(['aa','ab','ac','ae'])) 

In [4]: s['aa':'ae']
Out[4]: 
aa    1
ab    2
ac    3
ae    4
dtype: int64

In [5]: s['aa':]
Out[5]: 
aa    1
ab    2
ac    3
ae    4
dtype: int64

# you guys are asking whether this should work, so it basically ignores this out-of-bounds label
# and then treats it as ':'
In [6]: s['aa':'ad']
Out[6]: 
aa    1
ab    2
ac    3
dtype: int64

So what is the example that you are concerned about here?

@jorisvandenbossche
I think we shouldn't allow your above example (e.g. slicing with floats on an int index) AT ALL. (it seems partially broken I agree)

@jreback
Copy link
Contributor

jreback commented Nov 6, 2014

@shoyer

range(5)[10:]

is purely positional based, so I WOULD expect this work (and iloc DOES have these semantics).

But label based is a completely different animal. Aside from partial string indexing, I think you either have to have the label in the index (to now when to stop), or the current behavior of allowing a non-existant index to be replaced by the end-points (I personally find this confusing but it does make sense).

@immerrr
Copy link
Contributor

immerrr commented Nov 6, 2014

Int64Index by definition NEVER has label sematics.

I don't get why's that. I mean, I don't see user-level semantical difference between Index([100, 200, 300]) and Index(['100', '200', '300']), both are triples of labels, one of those happen to be integers, one of those happen to be strings, both happen to be monotonically increasing.

And an object index, CANNOT ever have out-of-bounds slicing, it is never a monotonic index type

That might depend on how do you read slicing operation. I read s.loc['aa':] as return all elements starting from the position of label 'aa'. If the sought element is not in the index, then yes, strictly speaking, position of label 'aa' is undefined, but if the index is monotonic you can have the next best thing: the position of label 'aa' if it were in the index.

I kind of like the idea of separating strict and lax lookups and making the user decide which one do they want, but I'd wildly guess that most of the time, especially interactively, they would go for lax lookups and thus it should be as convenient to use as it is now.

@jorisvandenbossche
Copy link
Member Author

Int64Index by definition NEVER has label sematics.

@jreback didn't you mean 'always' instead of 'never'? (at least for .ix and .loc). Or otherwise I don't understand this.

And an object index, CANNOT ever have out-of-bounds slicing, it is never a monotonic index type

But isn't it a general known feature that you can do this with a string index (the example you gave), although it only works with [] and .ix[] and not with .loc[]:

In [27]: s = pd.Series([1,2,3,4], index=list('abde'))

In [32]: s['a':'c']
Out[32]:
a    1
b    2
dtype: int64

In [33]: s.ix['a':'c']
Out[33]:
a    1
b    2
dtype: int64

In [34]: s.loc['a':'c']
...
KeyError: 'stop bound [c] is not in the [index]'

@jreback
Copy link
Contributor

jreback commented Nov 6, 2014

@immerrr I disagree, I think object indices (when they contain strings) are by definition non-monotonic (I can see an ordering of course, and maybe that IS the difference here).

@jorisvandenbossche
yeh, I think the .ix behavior with an Int64Index is always label based , but the [] is ALWAYS positional based. Very confusing, but not sure we can change this (w/o a major user issue).

as far as your second issue. I think that was your original example, .ix is inconsistent with this type of slicing (and with []). So need to decide how to handle this for non-monotonic object type indexes (non-withstanding @immerrr point where you may allow an object based index to be monotonic if the elements have a natural ordering, but IMHO this is dangerous).

@immerrr
Copy link
Contributor

immerrr commented Nov 6, 2014

I think object indices (when they contain strings) are by definition non-monotonic (I can see an ordering of course, and maybe that IS the difference here).

@jreback, our notions themselves may be different, because I'm often confused about what you say on that topic.

I usually think of ordering as a binary relation defined over a set of objects that possesses several properties (asymmetrical, transitive and one more I keep forgetting irreflexivity *) with less-than being the default ordering relation.

Now, less-than operation is not necessarily defined for two arbitrary Python objects, so in general, I agree, object index, unlike integer one, does not necessarily have an ordering. And it is under this assumption that NaT and NaN values technically break any ordering. But if it contains only strings and less-than operation between any two strings is defined, then it has an ordering, by definition of ordering, regardless of the actual order of elements in that index.

As for monotonicity, Index represents a certain location-to-label mapping which can be monotonically increasing (nondecreasing for non-unique) if for any two locations i and j, i < j implies idx[i] < idx[j] (i <= j i < j implies idx[i] <= idx[j] for non-unique).

Using these definitions I can't see how an object index can NEVER be monotonic if less-than (less-or-equal for non-unique) operation is defined for both locs — which are int — and for labels — which are str — and the monotonicity condition holds.

@jreback
Copy link
Contributor

jreback commented Nov 6, 2014

@immerrr

what I mean is a object index SHOULD NEVER be classified as monotonic. Even if it has a 'string' ordering. I would call that a StringIndex in that case (but of course we don't have that ATM).

The problem is that we may be giving an ordering to something that actually is not implied at all.

e.g.

'aa','ab','ac','az'

I think we are making an assumption that this is monotonic increasing in the strictest sense. That said maybe a user would expect this, but is IMHO not obvious at all, and thus we shouldn't do it.

By definition labels are NOT ordered. (Categories of course can be so this will solve the entire problem once we have CategoricalIndex).

not sure if that is more clear or not :)

@jorisvandenbossche
Copy link
Member Author

By definition labels are NOT ordered

What is then the point/logic of Index.is_monotonic ? That is based on the labels, no? So that implies we see a notion of order in the labels?
(edit: or were only talking about string labels? I was reading labels in general. But still, Index.is_monotonic gives True for a ['a', 'b', 'c'] index)

@immerrr
Copy link
Contributor

immerrr commented Nov 6, 2014

The problem is that we may be giving an ordering to something that actually is not implied at all.

So you're saying that unordered categorical values don't have ordering relation at all. I agree with that.

what I mean is a object index SHOULD NEVER be classified as monotonic

If index labels have an ordering and they are arranged in ascending order, they are monotonic, by definition. What you meant was probably that object labels should never have an implicit ordering.

That is reasonable for categorical data, but I don't think that arbitrary objects taken straight from python runtime should be interpreted as categorical values by default. In other words, given objs = Index(['a', 'b']), it's hardly intuitive to me that:

True == (objs[0] < objs[1])
# and in the same time
False == objs.is_monotonic_increasing

But again, I agree that CategoricalIndex should resolve this, because it seems perfectly fine to me to have category ordering (or the lack thereof) stated explicitly, e.g.

False == CategoricalIndex(objs, ordered=False).is_monotonic_increasing
# and even
True == CategoricalIndex(objs, categories=objs[::-1]).is_monotonic_decreasing

@shoyer
Copy link
Member

shoyer commented Nov 6, 2014

@jreback

But label based is a completely different animal. Aside from partial string indexing, I think you either have to have the label in the index (to now when to stop), or the current behavior of allowing a non-existant index to be replaced by the end-points (I personally find this confusing but it does make sense).

Actually, for monotonic indexes, I think we should map the labels to integer locations, and then indexing should be exactly the same as standard numpy/python indexing. So the non-existent label is inserted in the location that maintains the order. This is consistent and simpler than the current rules. I don't think we should have different rules for int/float indexes -- that is very surprising to me.

Side note: is there a good reason why we have not deprecated .ix yet? I notice that the docs mention that they work for mixed integer/label based access but that's such a niche situation that I think it would be better to remove it entirely (it can be done other ways).

@shoyer
Copy link
Member

shoyer commented Nov 8, 2014

I just took a look into implementing this change (commented out some lines of code, really) and ran into an unfortunate limitation: comparing an integer to a string does not raise an exception in Python 2.

Given that we want to raise KeyError rather than return size zero arrays when using integers for label based indexing if the labels are strings (if the index is monotonic), I don't think we can implement this for object indexes (unless we do type inference or make a real StringIndex).

However, every other type of index does have well defined types, so in principle we can replace the __contains__ checks with type checks for the slice bounds. If we go down that route, I would prefer to move the logic to slice_locs from _LocIndexer.

@shoyer
Copy link
Member

shoyer commented Nov 8, 2014

Another note about _LocIndexer for anyone who's feeling up for a refactor: we should try to eliminate the _has_valid_type method entirely, or at least everywhere it does in or isin lookups. It's easier to ask for forgiveness than permission! Right now .loc looks up every element in the index twice: once to make sure it's safe, once to get the actual value.

@jreback
Copy link
Contributor

jreback commented Nov 8, 2014

@shoyer can u show me the example that you say needs fixing
I haven't seen the usecase yet

u can try to change indexing but their are lots of special cases - their is a reason for has_valid_type - but appreciate someone wading in

@immerrr
Copy link
Contributor

immerrr commented Nov 8, 2014

There should be a method that casts the special case to a generic one or raises an error if it cannot, which is EAFP is about. Speaking of wading in, I think I'm on a way to something interesting in #8753, but I'm already concerned about merging it in a non-disruptive manner.

@jreback
Copy link
Contributor

jreback commented Nov 8, 2014

@immerrr @shoyer oh don't get me wrong, I am +++1 for you guys wading in. My experience in the past has shown that:

  • these changes often involve fixing a bug which might be a special case and that
  • you then don't really want to do the API change which really should be done
  • then this actually creates more special cases
  • but cleaning up the special cases is a negative feedback cycle (e.g. repeat this list)

so, your approaches are actually good to divide-and-conquer by starting with cleaning up slicing

@shoyer
Copy link
Member

shoyer commented Nov 9, 2014

@jreback OK, here's an example: s = pd.Series(range(5)). I expect that s.loc[-1:3] and s.loc[:3] should be equivalent. Or consider @jorisvandenbossche's example from the first post.

Having this sort of logic on the index instead of the indexer also means that libraries like my project xray can make use of these sort of checks (we reuse pandas indexes but not indexers).

Honestly, I'm not entirely certain it's worth the trouble of wading in to this. I do understand that (almost) every awkward special case is there to fix a real bug. That scares me! :)

@jorisvandenbossche
Copy link
Member Author

Disclaimer: i am not (yet) familiar with the indexing internals, so maybe my following comment will be stupid :-)

But, the things we are talking about for .loc, aren't they all already implemented in the code? Namely for .ix?
As my original example above worked for .ix, not for .loc. The example @shoyer gives now in the previous comment, also works for .ix: s.ix[-1:3] gives the expected result that he wants for s.loc[-1:3].

@jreback
Copy link
Contributor

jreback commented Nov 9, 2014

.loc HAS to be strict and is compared to .ix. for this reason:

In [8]: s = Series(range(5),[-2,-1,1,2,3])

In [9]: s
Out[9]: 
-2    0
-1    1
 1    2
 2    3
 3    4
dtype: int64

In [10]: s.loc[-10:3]
KeyError: 'start bound [-10] is not the [index]'

In [11]: s.loc[0:3]
KeyError: 'start bound [0] is not the [index]'

You are suggesting that [10] work I believe. I am big -1 on this. The entire point of indexing is basically to raise KeyError when something doesn't work. Yes convienence on slicing is nice, but a simple typo should not cause the behavior to radically change. (Not that what you are showing for behavior works in .ix, as @jorisvandenbossche shows above).

@immerrr
Copy link
Contributor

immerrr commented Nov 9, 2014

a simple typo should not cause the behavior to radically change

Safeguard against typos with such judgement may require you to name attributes methods according to Levenshtein distance between them, this doesn't seem reasonable to me.

@immerrr
Copy link
Contributor

immerrr commented Nov 9, 2014

And it's not about convenience, but rather about consistency and predictability.

The last example should work for all integer slice bounds. Or it should NOT, documentation should state explicitly that it does NOT (regardless of dtype) and there should be a method/indexer that does work (regardless of dtype), because selecting/setting/deleting all items with labels between bounds seems a very useful operation to me.

@jreback
Copy link
Contributor

jreback commented Nov 9, 2014

I think the last example if very predictable, it wont work as the docs state unless BOTH bounds are included. full stop. pandas has gone down this road before with allowing .ix to have different behavior when integers are present or not in an index. I think this is a very very bad idea. .loc is a strict indexing. Why should it change? It is about hey give me this data, raise if its not their.

I get consistency. I push for it. But what exactly is inconsistent about the current behavior?

@timmie I believe .ix DOES work for all of the cases you have described (if not that is a consistency/bug).

And certainly the docs on .loc could be even more explicit. Though I don't think we'll get users to read them any more than they do now :)

One further thought. It is my belief that the indexers are NOT orthogoal at all, and have much overlap. Whether that was originally a good decision is a point of view:

  • if you want strict consistency then it is bad
  • but for convience it is good

So this all has to be a balance. Cover the edge cases, and allow uses to not have to use a bunch of different methods. .loc/.iloc cover pretty much of the territory. .ix works but has some interpretation quirks. and [] is mostly for column access. I am not sure that we need another indexer.

@immerrr
Copy link
Contributor

immerrr commented Nov 9, 2014

One thing is that loc does allow inexact matches for float labels. This is somewhat reasonable, because floating points are often an approximation, but it is an inconsistency between dtypes.

Floating-point case kind of supports the idea of having a lax label indexer: you know that floats are better compared approximately, so you generally use an approximate indexer. But if you know that the exact float value should be in the index, you go for a strict one and run into an error early if it does not.

@shoyer
Copy link
Member

shoyer commented Nov 9, 2014

@jorisvandenbossche If my example works for .ix, I don't think it's intentional. .ix is excessively complex because it tries to do too many things (can we officially deprecate it?). In this case, things "just work" for .ix because checking that the slice bounds are in the index is an additional check that is only implemented on the .loc indexer.

@jreback I want both your examples 10 and 11 to work, like this:

In [10]: s.loc[-10:3]
Out[10]: 
-2    0
-1    1
 1    2
 2    3
 3    4
dtype: int64

In [11]: s.loc[0:3]
Out[11]: 
 1    2
 2    3
 3    4
dtype: int64

As far as consistency goes, I think it depends on your mental model for "strict indexing". If an index is monotonic, my mental model of .loc indexing does expect using a not included bound to work. Why? Well, I found it worked for:

  1. Float64Index always
  2. DatetimeIndex when using partial string indexing (even if it's entirely out of bounds, e.g., the wrong year)
  3. Int64Index when using floats, if the float rounds to an integer in the index, e.g., s.loc[1.5:] (this is quite common, given the prevalence of indexes that looks like range(n))

Here we have a "general rule" (bounds must be included) that is ignored for quite a few types of indexes for the sake of practically. We could "fix" all these cases 1-3 to make them consistent, but they are already consistent with each other and because this functionality can indeed be quite useful.

As a general rule, I usually assume that something will just work if it can clearly be interpreted in an unambiguously way and excluding it would require extra work.

Obviously this was also surprising to at least two other heavy pandas users besides myself (@jorisvandenbossche and @immerrr).

And the truth is, .loc already isn't that strict, because array based lookups with labels that aren't found magically work by doing a reindex. If I were to redesign .loc from scratch, I would keep out of bounds slicing for monotonic indexes but keep things strict by not doing automatic reindexing. Actually, that is exactly what I did for .loc in xray.... and if I do say so myself, the result is much simpler (although to be fair, slightly less performant because I don't use np.take).

@immerrr
Copy link
Contributor

immerrr commented Nov 9, 2014

can we officially deprecate ix?

ix is useful for at least one thing: combining label and positional indexers in the setitem operation., e.g. df.ix[:5, ['foo', 'bar', 'baz']] = 100. I don't see a way around this except for doing convert_slice_indexer manually.

@jreback
Copy link
Contributor

jreback commented Nov 9, 2014

yes, I don't see a need to deprectate .ix it HAS its uses (really for convience, but I DO use it).

.loc is really .ix except that it doesn't every try to positional interpret (an integer indexer). So @shoyer your suggestion is not incompatible with what is going on. The more I think about it, it IS reasonable to have all slicing (across index types), be compatible.

So I guess ONLY Int64Index does NOT support this type of out-of-bounds slicing? (using .loc)?
while all the other index types do?

@jorisvandenbossche
Copy link
Member Author

@jreback small comment on what you said above:

it wont work as the docs state unless BOTH bounds are included. full stop

I think that when the docs say "both the start and the stop are included" (http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing), this is about the fact that in label-based indexing the stop label is included, contrary to the usual python (integer location based) slices. And this is not about what is allowed for the start en stop label in such slices.

.ix is indeed still useful, and @shoyer, given our discussion in #8753, I think it is intentional that your example works with .ix.

@jreback jreback modified the milestones: 0.16.0, 0.15.2 Nov 30, 2014
@jreback
Copy link
Contributor

jreback commented Feb 26, 2015

going to work on this soon

TomAugspurger added a commit to TomAugspurger/dask that referenced this issue Apr 13, 2017
Changes `boundary_slice` to handle cases where

- the index is not sorted
- using label-based indexing (loc)
- the start or stop is missing

See pandas-dev/pandas#8613 for details on the pandas
side.
mrocklin pushed a commit to dask/dask that referenced this issue Apr 14, 2017
* BUG: boundary_slice assumes sorted indexes

Changes `boundary_slice` to handle cases where

- the index is not sorted
- using label-based indexing (loc)
- the start or stop is missing

See pandas-dev/pandas#8613 for details on the pandas
side.

* Avoid sorting in `boundary_slice`

* Additional tests, falsey endpoints
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Docs Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants