-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG/ENH: provide better .loc based semantics for float based indicies, continuing not to fallback (related GH236) #4850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if key.stop not in ax: | ||
raise KeyError("stop bound [%s] is not in the [%s]" % (key.stop,self.obj._get_axis_name(axis))) | ||
idx_type = ax.inferred_type | ||
if idx_type == 'floating': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really specific to floats, or should it also be supported for other index types? E.g. .ix
on sorted multi-indexes also does the spanning thing (IIRC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the semantics REQUIRE that the endpoints in a mi exist in the index. IOW (which then become the sliced endpoints). Do you have a case that you think should work? (I know this is esoteric, but what the heck!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a use case off the top of my head, I just have some vague memory that it works this way :-).
Honestly the more I think about it the more I suspect that the actual clean solution is to segregate the spanning-slice behaviour into an OrderedIndex type/type hierarchy (which would also enforce that index elements are sorted, might at some point add the option of to skipping the hash table entirely in favor of binary search's decent performance with vastly smaller memory usage, etc.). This thing where we try to guess which semantics are wanted by inspecting the values is pretty uncomfortable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And I always assumed that this discomfort was why this feature was somewhat inconsistent and poorly documented.)
The behaviour in the test case does look like what I'm hoping for, though. |
updated.... unfortunately not sure what if anything to do about getitem and ix for a floating index...hmm |
For getitem and ix, one thing to do is to fix an independent bug that In [20]: pandas.Series(np.arange(5))[3.5] Using .index is how Python indexing works in general (e.g., what lists That does still leave the question of what to do with s.ix[4:5] when On Mon, Sep 16, 2013 at 6:24 PM, jreback [email protected] wrote:
|
@njsmith updated...take a look I stub-implemented Float64Index (which means its just a new class, but is still dtype=object) |
s = Series(np.arange(5)) | ||
self.assertRaises(KeyError, lambda : s.loc[3.5]) | ||
self.assertRaises(KeyError, lambda : s.ix[3.5]) | ||
self.assertRaises(KeyError, lambda : s[3.5]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to also check s.ix[3.5:4]
et cetera.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's tested below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the end of the file, not sure what you mean :-)
I don't see any tests for float slices with a non float index raising an error: Series(np.arange(5))[1.0:5.0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean above; see my comments below, this PR is too much in scope to change the behavior of ix/[]
for Float64Index
. I basically have to drop fallback indexing, but then decide what certain actions mean. Pls review this in its entirety and lmk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still true AFAICT that you've changed what Series(np.arange(5))[4.0]
does, but not touched Series(np.arange(5))[4.0:5.0]
, and have no tests for this latter case.
its pretty easy to make however If I basically eliminate fallback, the all 8 of these cases will return the same. that seems good (and this will ONLY happen for a float based index). its an API change, but I don't think too big thoughts? |
I have removed the fallback stuff, will have to revist at some point. I can take it out, but then semantics become very weird. e.g.
I think what exists now does a pretty good job via With a But that will be another issue. |
Sorry, with all this back and forth I've gotten pretty lost, and I'm finding your notes just a hair too telegraphic to really follow. Would it be possible to summarize exactly what behavioural changes are now included in this PR? |
I updated the header of this PR 3 changes (which are in order)
|
Sorry, not trying to be dense, but still pretty confused.
|
look at the first example in the top of the PR,
This is the 'same' as
essentially I CAN remove fallback indexing entirely from I believe the PR as is solves all of the issues that you brought up. You can simply use |
self.assert_(result1 == result3) | ||
|
||
# non-fallback location based should raise this error | ||
self.assertRaises(KeyError, lambda : s.loc[4.0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I only just realized that pandas 0.12.0 actually fails this... loc
actually does fallback indexing!
In [13]: s.loc[4.0]
Out[13]: 4
So, uh, good fix...
Okay, got it now. The The handling of the float truncation issue is still a bit wonky. You aren't testing what happens with a non-float-index Series, or with .iloc, when someone puts floats into a fancy index (list/ndarray) or slice, and I'm pretty sure at least slices will still do truncation in this case. And your code can't possibly be fully correct for integer indexing if it never refers to |
BTW this stuff in |
@njsmith this whole fallback notion is wonky! (but is somewhat necessary). Its just that for a Float64Index, I think you never want fallback. Problem is only some of the logic is actually implemented thru the index, some is directly on getitem and such. You are welcome to take a stab (on top of this PR or independently) if you like. It is quite easy to break something while fixing another problem. Its like whack a mole. what would be appreciated is a larger set of tests with various kinds of indicies and the expections give various kinds of inputs. test_indexing does quite a lot of this, but apparently some cases were missed. |
All I'm saying with the "wonky" comment is that the behaviour of |
there are some inconsistenties...trying to nail them down...but this is quite complicated! |
Yeah, I hear that! Another minor thought that just occurred to me while writing some code that instantiates these: it would be nice if explicitly calling the |
what should this yield for (this is why this is tricky) as you can make a case that
|
in answer to your question, |
Since there's obviously no label-based indexing going on here, we only have The general python rule for integer/offset-based indexing (used by e.g. The one other questionable case for me is s.loc[2]. I actually think this
The problem is that when you actually lay out the logic like this you can IMHO to make .loc predictable, the fallback to integer/offset-based In practice I guess the only common situations where people use On Wed, Sep 18, 2013 at 8:48 PM, jreback [email protected] wrote:
|
Shouldn't the rule be, Float64Index is inferred if you have all floating Basicaly Index([1.0, "hi", None]) should not return a Float64Index, is what On Wed, Sep 18, 2013 at 8:49 PM, jreback [email protected] wrote:
|
no... and can't just give up on floating point, e.g. 2.0 as that's the default now |
Oh, well, I actually like the rule that s.loc[2] should always error out But that's not actually how it works in my 0.12.0: In [6]: pandas.version In [7]: s = pandas.Series([0,1,2],['foo','bar','bar']) In [8]: s.loc[2] AFAICT if you have an Int64Index, then .loc acts the way you're saying, and On Wed, Sep 18, 2013 at 9:16 PM, jreback [email protected] wrote:
|
And on the floating point thing -- not sure what you mean by "giving up on On Wed, Sep 18, 2013 at 9:16 PM, jreback [email protected] wrote:
|
@jtrater your are right, hahha, will fix |
|
||
sf.iloc[3] | ||
|
||
A scalar index the is not found will raise ``KeyError`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the" should be "that"
@njsmith ok...this looks ready to go...if you can give it a final test/run-thru (and provide me with an example of a use case) would be great.. |
@njsmith ping? |
try: | ||
subarr = np.array(data, dtype=dtype, copy=copy) | ||
except: | ||
return Index(data,name=name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bare except
without a re-raise is always a bug, this will swallow keyboard interrupts and things. At a minimum should use except Exception
That said the fallback handling still seems weird to me. You keep putting it in so I guess you have some reason for it but I can't figure it out. It looks like Float64Index(["a", "b"])
is an error, but Float64Index([object(), object()])
gives a regular Index
? Is that on purpose? Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an error, this protect the user from doing an invalid type cast
there IS NOT fallback handling for Float64Index. Where do I show it?
ahh I see u want tr float index to basically refuse to deal (raise) on something that is not convertible in theory I agree but I have to check the semantics/API of what we r doing elsewhere |
@jreback: Sorry for losing track of this for a few days there -- the behaviour now looks fine (modulo the quibble about |
perfect I think I mentioned before but if like to out an example in docs where you use this in a real world use case (eg a generic simplified version of what u r doing) - can u provide an example with a few comments ? thxs |
There are several examples in #236 and mine is pretty typical of them. I df.loc[0:1000, "MiCe"] to pick out 1 second of data from the "Midline Central" electrode (and this df.loc[0:1000, "MiCe"].plot() will produce a sensible plot. The key features this PR provides for me are that (1) .loc now works for On Wed, Sep 25, 2013 at 3:12 PM, jreback [email protected] wrote:
|
@njsmith incorporated your addtl tests (already had most of them in any event), now Float64Index will raise on invalid coerced input (e.g. Float64Index([Timestamp(...)]))....other index types do this as well...so ok added your doc example...... |
…, continuing not to fallback (related to GH236) API: stub-implementation of Float64Index (meaning its the 'same' as an Index at this point), related GH263 BUG: make sure that mixed-integer-float is auto-converted to Float64Index BUG: (GH4860), raise KeyError on indexing with a float (and not a floating index) CLN/ENH: add _convert_scalar_indexer in core/index.py CLN/ENH: add _convert_slice_indexer in core/index.py TST: slice indexer test changes in series/frame BUG: fix partial multi-indexing DOC: release notes / v0.13.0 TST: tests for doc example CLN: core/indexing.py clean DOC: add to indexing.rst DOC: small corections API: make Float64Index getitem/ix/loc perform the same (as label based) for scalar indexing TST: Float64Index construction & tests TST: misc tests corrected in test_series/test_reshape/test_format TST: Float64Index tests indexing via lists DOC: additional example for float64index
@njsmith alright....read to merge...give me 1 more look see |
BUG/ENH: provide better .loc based semantics for float based indicies, continuing not to fallback (related GH236)
closes #236
.loc
semantics on floating indiciesFloat64Index
provide better
loc
semantics onFloat64Index
Float Slicing now raises when indexing by a non-integer slicer