-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix scalar iloc #15778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix scalar iloc #15778
Conversation
Codecov Report
@@ Coverage Diff @@
## master #15778 +/- ##
==========================================
+ Coverage 90.38% 91% +0.61%
==========================================
Files 161 143 -18
Lines 50916 49433 -1483
==========================================
- Hits 46021 44985 -1036
+ Misses 4895 4448 -447
Continue to review full report at Codecov.
|
(I'm trying to make a sense out of failed tests... in vain: nodes 0 and 2 fail reliably, but the difference between 2 and 3, which succeeds, should just be the python version - but 3.5 succeeds in TravisCI - and the locale) I will investigate this later |
@toobaz I would proceed in a slightly different manner here. The key is to make another method in block manager that sets that is indexer based. This is like what
I would start simple and w/o actually trying to fix an existing bug, but just trying to make positional based indexing work directly on a block. |
Sorry, I probably wasn't clear: |
(or if you are just saying "let's leave |
this would likely be quite tricky. yes its the right thing to do, but covering all of the existing cases is hard. let's shoot for a simpler goal of simplifying code gradually as much as possible. best way to do that is NOT to add new functionailty / fixes, rather to simplify cases that are existing in a more general way, w/o increasing the complexity. |
Sure I agree in principle, but in practice?
I'm just asking: why another one instead than improving |
Well, after all I don't mind too much. Just tell me how to call a method that...
and I will try to write it. |
no this is ok too. |
would need a rebase |
Rebased, but I'm still unable to make sense of the failing tests (and I don't have time to investigate now) |
closing as stale. if you want to re-visit, pls comment and we can re-open. |
git diff master --name-only -- '*.py' | flake8 --diff
Unless I'm missing anything, all code for indexers (inside
pandas/core/indexing.py
) resorts to one of the following for setting values:BlockManager.setitem
(simple case, e.g. no multiple dtypes involved), do so__setitem__
), which are label basedThis makes it impossible, in the "non-simple case", to index positionally in the correct way with duplicated labels (by the way, it's also plain ugly).
I can see three (four) possible solutions:
BlockManager
code), arguably not very elegant (although we could already expect some efficiency gains compared to current master, since this removes some equality checks on labels)BlockManager
s, hence for instance taking care of dtype changesBlockManager.setitem
so that it is able to take care of multiple dtypes. Probably the best approach (would allow to enormously simplify indexing code), and the one requiring most workindexing.py
currently relying onself.obj.__setitem__
so it directly talks toBlockManager.setitem
)Notice that
BlockManager.setitem
must be fixed in order to fix #12991, but the fix for that bug is significantly simpler than 3. (it is just a matter of shifting indices, not of working with different dtypes).The current PR... works, and does clean a bit the
_setter
interface (i.e. concerning the difference betweeniloc
andloc
), which might be a good thing even in case we later decide to drop the less elegantInfoCleaner
. My preferred approach would be to apply it, then start working on 3., then come back to the indexing code and simplify it by profiting of 3. But if you prefer to start directly with 2. or 3., I can have a try in that direction. I don't think I'm willing to try 4. because current indexing code is already too complicated.