Skip to content

Fix scalar iloc #15778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Fix scalar iloc #15778

wants to merge 3 commits into from

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Mar 22, 2017

Unless I'm missing anything, all code for indexers (inside pandas/core/indexing.py) resorts to one of the following for setting values:

  • if we can work directly on BlockManager.setitem (simple case, e.g. no multiple dtypes involved), do so
  • otherwise, rely on methods provided by the object (mainly __setitem__), which are label based

This makes it impossible, in the "non-simple case", to index positionally in the correct way with duplicated labels (by the way, it's also plain ugly).

I can see three (four) possible solutions:

  1. temporarily strip an object from (some of) its labels - what this PR does. Arguably the simplest approach (doesn't require changing objects/BlockManager code), arguably not very elegant (although we could already expect some efficiency gains compared to current master, since this removes some equality checks on labels)
  2. add methods to the different objects that do position-based setting. Not trivial as it sounds, I'm afraid - such methods should talk directly the the BlockManagers, hence for instance taking care of dtype changes
  3. seriously improving BlockManager.setitem so that it is able to take care of multiple dtypes. Probably the best approach (would allow to enormously simplify indexing code), and the one requiring most work
  4. (completely rewrite all setting code in indexing.py currently relying on self.obj.__setitem__ so it directly talks to BlockManager.setitem)

Notice that BlockManager.setitem must be fixed in order to fix #12991, but the fix for that bug is significantly simpler than 3. (it is just a matter of shifting indices, not of working with different dtypes).

The current PR... works, and does clean a bit the _setter interface (i.e. concerning the difference between iloc and loc), which might be a good thing even in case we later decide to drop the less elegant InfoCleaner. My preferred approach would be to apply it, then start working on 3., then come back to the indexing code and simplify it by profiting of 3. But if you prefer to start directly with 2. or 3., I can have a try in that direction. I don't think I'm willing to try 4. because current indexing code is already too complicated.

@codecov
Copy link

codecov bot commented Mar 22, 2017

Codecov Report

Merging #15778 into master will increase coverage by 0.61%.
The diff coverage is 93.87%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15778      +/-   ##
==========================================
+ Coverage   90.38%      91%   +0.61%     
==========================================
  Files         161      143      -18     
  Lines       50916    49433    -1483     
==========================================
- Hits        46021    44985    -1036     
+ Misses       4895     4448     -447
Flag Coverage Δ
#multiple ?
#single ?
Impacted Files Coverage Δ
pandas/core/indexing.py 94.18% <100%> (+0.17%) ⬆️
pandas/types/cast.py 85.04% <72.72%> (ø)
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/tools/plotting.py 71.77% <0%> (-10.05%) ⬇️
pandas/types/concat.py 98.06% <0%> (-1.94%) ⬇️
pandas/conftest.py 94.11% <0%> (-1.72%) ⬇️
pandas/compat/pickle_compat.py 68.29% <0%> (-1.22%) ⬇️
pandas/tools/hashing.py 99.02% <0%> (-0.98%) ⬇️
pandas/io/packers.py 87.57% <0%> (-0.94%) ⬇️
pandas/core/base.py 95.48% <0%> (-0.7%) ⬇️
... and 189 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ea0f25...2a37bf6. Read the comment docs.

@toobaz
Copy link
Member Author

toobaz commented Mar 22, 2017

(I'm trying to make a sense out of failed tests... in vain: nodes 0 and 2 fail reliably, but the difference between 2 and 3, which succeeds, should just be the python version - but 3.5 succeeds in TravisCI - and the locale)

I will investigate this later

@jreback
Copy link
Contributor

jreback commented Mar 22, 2017

@toobaz I would proceed in a slightly different manner here.

The key is to make another method in block manager that sets that is indexer based. This is like what .take does but for setting.

  1. doesn't really matter atm, that's another issue.

I would start simple and w/o actually trying to fix an existing bug, but just trying to make positional based indexing work directly on a block.

@toobaz
Copy link
Member Author

toobaz commented Mar 22, 2017

The key is to make another method in block manager that sets that is indexer based

Sorry, I probably wasn't clear: BlockManager.setitem is position based, but it only works in simple cases (it is already used by indexing code in those cases). This is why 3. is a possible solution: making it more general.

@toobaz
Copy link
Member Author

toobaz commented Mar 22, 2017

(or if you are just saying "let's leave setitem as it is and restart from scratch": fine then, but the new method will presumably still rely on Block.setitem())

@jreback
Copy link
Contributor

jreback commented Mar 22, 2017

(or if you are just saying "let's leave setitem as it is and restart from scratch": fine then, but the new

this would likely be quite tricky. yes its the right thing to do, but covering all of the existing cases is hard. let's shoot for a simpler goal of simplifying code gradually as much as possible.

best way to do that is NOT to add new functionailty / fixes, rather to simplify cases that are existing in a more general way, w/o increasing the complexity.

@toobaz
Copy link
Member Author

toobaz commented Mar 22, 2017

best way to do that is NOT to add new functionailty / fixes, rather to simplify cases that are existing in a more general way, w/o increasing the complexity.

Sure I agree in principle, but in practice?

another method in block manager that sets that is indexer based

I'm just asking: why another one instead than improving BlockManager.setitem?

@toobaz
Copy link
Member Author

toobaz commented Mar 22, 2017

I'm just asking: why another one instead than improving BlockManager.setitem?

Well, after all I don't mind too much. Just tell me how to call a method that...

is like what .take does but for setting.

and I will try to write it.

@jreback
Copy link
Contributor

jreback commented Mar 22, 2017

I'm just asking: why another one instead than improving BlockManager.setitem?

no this is ok too.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation labels Mar 22, 2017
@jreback
Copy link
Contributor

jreback commented May 13, 2017

would need a rebase

@toobaz toobaz force-pushed the fix_scalar_iloc branch from 3be6f12 to 2a37bf6 Compare May 13, 2017 22:40
@toobaz toobaz force-pushed the fix_scalar_iloc branch from 2a37bf6 to de098e0 Compare May 13, 2017 22:55
@toobaz
Copy link
Member Author

toobaz commented May 13, 2017

Rebased, but I'm still unable to make sense of the failing tests (and I don't have time to investigate now)

@jreback
Copy link
Contributor

jreback commented Aug 1, 2017

closing as stale. if you want to re-visit, pls comment and we can re-open.

@jreback jreback closed this Aug 1, 2017
@toobaz toobaz mentioned this pull request Aug 3, 2017
4 tasks
@toobaz toobaz deleted the fix_scalar_iloc branch August 7, 2017 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: iloc fills multiple columns, if columns have duplicate names BUG: .iloc indexing with duplicates
2 participants