Fix scalar iloc #15778

toobaz · 2017-03-22T10:22:51Z

closes BUG: .iloc indexing with duplicates #15686
tests added / passed
passes git diff master --name-only -- '*.py' | flake8 --diff
whatsnew entry

Unless I'm missing anything, all code for indexers (inside pandas/core/indexing.py) resorts to one of the following for setting values:

if we can work directly on BlockManager.setitem (simple case, e.g. no multiple dtypes involved), do so
otherwise, rely on methods provided by the object (mainly __setitem__), which are label based

This makes it impossible, in the "non-simple case", to index positionally in the correct way with duplicated labels (by the way, it's also plain ugly).

I can see three (four) possible solutions:

temporarily strip an object from (some of) its labels - what this PR does. Arguably the simplest approach (doesn't require changing objects/BlockManager code), arguably not very elegant (although we could already expect some efficiency gains compared to current master, since this removes some equality checks on labels)
add methods to the different objects that do position-based setting. Not trivial as it sounds, I'm afraid - such methods should talk directly the the BlockManagers, hence for instance taking care of dtype changes
seriously improving BlockManager.setitem so that it is able to take care of multiple dtypes. Probably the best approach (would allow to enormously simplify indexing code), and the one requiring most work
(completely rewrite all setting code in indexing.py currently relying on self.obj.__setitem__ so it directly talks to BlockManager.setitem)

Notice that BlockManager.setitem must be fixed in order to fix #12991, but the fix for that bug is significantly simpler than 3. (it is just a matter of shifting indices, not of working with different dtypes).

The current PR... works, and does clean a bit the _setter interface (i.e. concerning the difference between iloc and loc), which might be a good thing even in case we later decide to drop the less elegant InfoCleaner. My preferred approach would be to apply it, then start working on 3., then come back to the indexing code and simplify it by profiting of 3. But if you prefer to start directly with 2. or 3., I can have a try in that direction. I don't think I'm willing to try 4. because current indexing code is already too complicated.

codecov · 2017-03-22T11:22:33Z

Codecov Report

Merging #15778 into master will increase coverage by 0.61%.
The diff coverage is 93.87%.

@@            Coverage Diff             @@
##           master   #15778      +/-   ##
==========================================
+ Coverage   90.38%      91%   +0.61%     
==========================================
  Files         161      143      -18     
  Lines       50916    49433    -1483     
==========================================
- Hits        46021    44985    -1036     
+ Misses       4895     4448     -447

Flag	Coverage Δ
#multiple	`?`
#single	`?`

Impacted Files	Coverage Δ
pandas/core/indexing.py	`94.18% <100%> (+0.17%)`	⬆️
pandas/types/cast.py	`85.04% <72.72%> (ø)`
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/tools/plotting.py	`71.77% <0%> (-10.05%)`	⬇️
pandas/types/concat.py	`98.06% <0%> (-1.94%)`	⬇️
pandas/conftest.py	`94.11% <0%> (-1.72%)`	⬇️
pandas/compat/pickle_compat.py	`68.29% <0%> (-1.22%)`	⬇️
pandas/tools/hashing.py	`99.02% <0%> (-0.98%)`	⬇️
pandas/io/packers.py	`87.57% <0%> (-0.94%)`	⬇️
pandas/core/base.py	`95.48% <0%> (-0.7%)`	⬇️
... and 189 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ea0f25...2a37bf6. Read the comment docs.

toobaz · 2017-03-22T12:01:17Z

(I'm trying to make a sense out of failed tests... in vain: nodes 0 and 2 fail reliably, but the difference between 2 and 3, which succeeds, should just be the python version - but 3.5 succeeds in TravisCI - and the locale)

I will investigate this later

jreback · 2017-03-22T12:12:50Z

@toobaz I would proceed in a slightly different manner here.

The key is to make another method in block manager that sets that is indexer based. This is like what .take does but for setting.

doesn't really matter atm, that's another issue.

I would start simple and w/o actually trying to fix an existing bug, but just trying to make positional based indexing work directly on a block.

toobaz · 2017-03-22T12:32:08Z

The key is to make another method in block manager that sets that is indexer based

Sorry, I probably wasn't clear: BlockManager.setitem is position based, but it only works in simple cases (it is already used by indexing code in those cases). This is why 3. is a possible solution: making it more general.

toobaz · 2017-03-22T12:34:14Z

(or if you are just saying "let's leave setitem as it is and restart from scratch": fine then, but the new method will presumably still rely on Block.setitem())

jreback · 2017-03-22T12:46:45Z

(or if you are just saying "let's leave setitem as it is and restart from scratch": fine then, but the new

this would likely be quite tricky. yes its the right thing to do, but covering all of the existing cases is hard. let's shoot for a simpler goal of simplifying code gradually as much as possible.

best way to do that is NOT to add new functionailty / fixes, rather to simplify cases that are existing in a more general way, w/o increasing the complexity.

toobaz · 2017-03-22T14:58:44Z

best way to do that is NOT to add new functionailty / fixes, rather to simplify cases that are existing in a more general way, w/o increasing the complexity.

Sure I agree in principle, but in practice?

another method in block manager that sets that is indexer based

I'm just asking: why another one instead than improving BlockManager.setitem?

toobaz · 2017-03-22T15:07:52Z

I'm just asking: why another one instead than improving BlockManager.setitem?

Well, after all I don't mind too much. Just tell me how to call a method that...

is like what .take does but for setting.

and I will try to write it.

jreback · 2017-03-22T15:32:43Z

I'm just asking: why another one instead than improving BlockManager.setitem?

no this is ok too.

jreback · 2017-05-13T21:39:50Z

would need a rebase

closes pandas-dev#15686

toobaz · 2017-05-13T23:27:58Z

Rebased, but I'm still unable to make sense of the failing tests (and I don't have time to investigate now)

jreback · 2017-08-01T22:56:27Z

closing as stale. if you want to re-visit, pls comment and we can re-open.

toobaz force-pushed the fix_scalar_iloc branch from efc3958 to 3be6f12 Compare March 22, 2017 11:21

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation labels Mar 22, 2017

toobaz mentioned this pull request Mar 28, 2017

Odd behaviour when slicing Pandas dataframe with numeric column headings #15824

Open

toobaz added 2 commits May 14, 2017 00:11

BUG: Fix iloc with duplicate labels

b3096bb

closes pandas-dev#15686

TST: fixed one tests and added two more for pandas-dev#15686

99e44d7

toobaz force-pushed the fix_scalar_iloc branch from 3be6f12 to 2a37bf6 Compare May 13, 2017 22:40

TST: added xfailing test on iloc with dups

de098e0

toobaz force-pushed the fix_scalar_iloc branch from 2a37bf6 to de098e0 Compare May 13, 2017 22:55

jreback closed this Aug 1, 2017

toobaz mentioned this pull request Aug 3, 2017

Fix scalar iloc rebase #17163

Closed

4 tasks

toobaz deleted the fix_scalar_iloc branch August 7, 2017 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix scalar iloc #15778

Fix scalar iloc #15778

toobaz commented Mar 22, 2017 •

edited

Loading

codecov bot commented Mar 22, 2017 •

edited

Loading

toobaz commented Mar 22, 2017

jreback commented Mar 22, 2017

toobaz commented Mar 22, 2017 •

edited

Loading

toobaz commented Mar 22, 2017 •

edited

Loading

jreback commented Mar 22, 2017

toobaz commented Mar 22, 2017

toobaz commented Mar 22, 2017

jreback commented Mar 22, 2017

jreback commented May 13, 2017

toobaz commented May 13, 2017

jreback commented Aug 1, 2017

Fix scalar iloc #15778

Fix scalar iloc #15778

Conversation

toobaz commented Mar 22, 2017 • edited Loading

codecov bot commented Mar 22, 2017 • edited Loading

Codecov Report

toobaz commented Mar 22, 2017

jreback commented Mar 22, 2017

toobaz commented Mar 22, 2017 • edited Loading

toobaz commented Mar 22, 2017 • edited Loading

jreback commented Mar 22, 2017

toobaz commented Mar 22, 2017

toobaz commented Mar 22, 2017

jreback commented Mar 22, 2017

jreback commented May 13, 2017

toobaz commented May 13, 2017

jreback commented Aug 1, 2017

toobaz commented Mar 22, 2017 •

edited

Loading

codecov bot commented Mar 22, 2017 •

edited

Loading

toobaz commented Mar 22, 2017 •

edited

Loading

toobaz commented Mar 22, 2017 •

edited

Loading