-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: behaviour of label indexing with floats on integer index #12333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think there is clear agreement about the following:
There is less agreement on when floats should be allowed when doing label indexing on a numerical but not-float index (so integer index or RangeIndex) in case the float can be interpreted as an integer:
I think for setting, the rule is if the label you want to set evaluates as equal to one of the labels in the index, then it works without changing the index ( I would argue that the same rule should apply for getting as for setting. |
The issue I have with @jorisvandenbossche last
as this would only apply to But since this in an integer index we are talking about we have the odd fallback where it is treated like a label ( |
We could say that a python dict is also strict about labels, but I think I'm in agreement with Joris; haven't though how it affects fallback indexing entirely yet. |
AFAIK, integer indexes don't have any fallback to positional, and the indexer is always interpreted as label. So I don't see how this could mess up with the fallback indexing.
I don't see why that is a reason to make |
You also have this discrepancy:
(to be honest, it would only be a full discrepancy if this gave a "KeyError 1.0 not found in index") |
ok, so @jorisvandenbossche what do you think should work then. give me an example for setting/getting with the various indexers. |
I agree with @jorisvandenbossche, as I stated in the last PR:
|
so @shoyer then essentially you want to have full support back for float indexers? (excluding thats what your conditions imply. |
@jreback Yes, I guess so. If pandas were stricter about not upcasting types (or even better, if we did not support reindexing with |
I suppose this means we are reverting this as well.
and have it return equiv of Note that this will only apply to integer-like indices as string index will still raise |
No, that should still raise, as this is positional indexing (alas ..):
|
But you are correct for the case of slicing in
|
that last has always been allowed. |
@jreback I don't really understand the confusion between us. My main issue is (here and in https://github.com/pydata/pandas/pull/12370/files#r53281590): I don't see why we should have differences in behaviour between scalar indexing and slicing (with regard to labels being found or not). It should just be:
This last rule is then applicable to Why would this last one be different for |
the confusion is we had agreement on disallowing floats in indexers except for float indices which is s very simple rule now we are allowing them all over the place essentially going back to where we were creating another indexing mess where nothing is simple |
And I am sorry for that, as you are doing the hard work in dealing with the complex indexing code, and then changing behaviour every time is not making it easier
OK, I agree that that is indeed a clear and simple rule. However, I personally think the rule should be how @shoyer formulated it: "We should use equality to determine whether or not a key matches". It's from that point of vue (accepting the equality-rule), that I don't see why positional based slicing in |
I want to echo @jorisvandenbossche thanks for all your work on this. I'm still too scared to open up The rules of
are the best. |
Hmm, the (Using 0.17.1 here for the examples) Assuming we follow the rules as @shoyer summarised them above (#12333 (comment)), accessing one value with a float that matches should work:
Then I would say, slicing with matching items should also work:
But what to do with slicing with a float that does not match one of the labels in the index? When slicing with integers, the slice values don't need to be contained in the index, eg:
So should this also work with floats? And currently, this also works with floats with a fractional part, eg:
So in the slicing case, there is no equality required (only an ability to do searchsorted?), in the case of a monotonic index. So the defined rules do not really cover this case .. |
Slicing with floats in on an integer index is actually (now I remember) deliberately implemented, and even extended to work with decreasing monotonic indexes by @shoyer (#8680) even after the initial deprecation for float indexers was put in place. This also did not raise a deprecation warning:
while the above now raises in master. So @jreback even if we decide that we don't want this behaviour any longer, we should at least first deprecate it IMO. |
I don't mind fixing it to be the right API, but I don't agree with @shoyer at all here. The indexing should strictly depend on the index type. I don't see why a float should be coereced to integer if they happen to be equal. I heare preaching all the time about how data shouldn't determine indexing. Well isn't that exactly what you are doing here? We should not allow float indexer/slicers at all except with A python dictionary and numpy are just too simple for this case. They don't have to simultaneously deal with different TYPES of indexers and they only care about strict label matching (dicts) or positional (numpy). To be honest I don't care what the rules are for Pandas must deal with both. I think this is getting pretty lost in the fact that a user should have really well laid out rules for what is expected of indexing. They by definition HAVE to be simple. They are already way way too complex and special cased. |
from #12246
@jorisvandenbossche this looks odd
The text was updated successfully, but these errors were encountered: