Skip to content

Assignment of subpart of data frame with multi index yields user surprise #18120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Nov 5, 2017 · 6 comments
Closed
Labels
Bug Duplicate Report Duplicate issue or pull request MultiIndex

Comments

@ghost
Copy link

ghost commented Nov 5, 2017

from collections import OrderedDict
import random
import pandas as pd


pd.__version__
'0.20.3'
df = pd.DataFrame.from_dict(OrderedDict(
    [
        ((lv1, lv2, lv3), {'KG': random.randint(0, 100)})
        for lv1 in ['A', 'B']
        for lv2 in ['M', 'N', 'O']
        for lv3 in ['X', 'Y', 'Z']
    ]
))
df     
A B
M N O M N O
X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z
KG 16 71 27 8 83 0 100 72 100 42 26 21 91 5 36 98 96 52
df.loc[:, idx[:, :, 'X']]
A B
M N O M N O
X X X X X X
KG 44 83 -28 5 -31 44
df.xs('Y', level=2, axis=1) - df.xs('Z', level=2, axis=1)
A B
M N O M N O
KG 44 83 -28 5 -31 44
idx = pd.IndexSlice
df.loc[:, idx[:, :, 'X']] = (df.xs('Y', level=2, axis=1) - df.xs('Z', level=2, axis=1))
df
A B
M N O M N O
X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z
KG NaN 71 27 NaN 83 0 NaN 72 100 NaN 26 21 NaN 5 36 NaN 96 52

This was unexpected. I realize that the indexes do not match, and hence this behavior, but I expect Pandas to throw an Exception, rather than silently do this

idx = pd.IndexSlice
df.loc[:, idx[:, :, 'X']] = (df.xs('Y', level=2, axis=1) - df.xs('Z', level=2, axis=1)).values
df
A B
M N O M N O
X Y Z X Y Z X Y Z X Y Z X Y Z X Y Z
KG 44 71 27 83 83 0 -28 72 100 5 26 21 -31 5 36 44 96 52

This was, of course, the behavior I was expecting

@gfyoung
Copy link
Member

gfyoung commented Nov 5, 2017

@kghosesbg : Thanks for reporting this! Looks quite odd to me indeed. This is reproducible on master, so feel free to investigate and see what's going on here.

@jreback
Copy link
Contributor

jreback commented Nov 5, 2017

This is expected, you are assigning mismatched levels. It is pretty tricky to actually figure out what you are doing is invalid. Sure you can make an assumption that you are assigning based on sub-levels, or that the levels don't match on the left and right sides, but this is pretty fragile.
xref #12343, #7475, duplicate of #10440

In [13]: df.loc[:, idx[:, :, 'X']] = (df.xs('Y', level=2, axis=1) - df.xs('Z', level=2, axis=1))

In [14]: df.loc[:, idx[:, :, 'X']]
Out[14]: 
     A           B        
     M   N   O   M   N   O
     X   X   X   X   X   X
KG NaN NaN NaN NaN NaN NaN

This is the correct method, essentially no alignment and a direct assignment.

In [16]: df.loc[:, idx[:, :, 'X']] = (df.xs('Y', level=2, axis=1) - df.xs('Z', level=2, axis=1)).values

In [17]: df
Out[17]: 
     A                                  B                                 
     M          N           O           M           N           O         
     X   Y   Z  X   Y   Z   X   Y   Z   X   Y   Z   X   Y   Z   X   Y    Z
KG  79  98  19  4  34  30 -52  34  86  29  70  41 -66  14  80 -58  42  100

This is very simliar in concept to the warning at the end of this section.
If you wanted to make an example for MultiIndexing would be great.

http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics

@jreback jreback closed this as completed Nov 5, 2017
@jreback jreback added the Duplicate Report Duplicate issue or pull request label Nov 5, 2017
@jreback jreback added this to the No action milestone Nov 5, 2017
@jreback
Copy link
Contributor

jreback commented Nov 5, 2017

@kghosesbg we need someone to collect the cases for MultiIndex and write/re-write the logic a bit. I am not opposed to raising in a case like this, but this requires some investigation. Would be great just to have the cases listed (and we can xfail them). Then tackle later. would you be interested in this?

@ghost
Copy link
Author

ghost commented Nov 5, 2017

Hi @jreback I'm happy to help as needed, though it might go slowly. Could you please clarify the task. From you previous comment I understand that you'd like to add this example to the multi-indexing docs, which I can do. Is the next to collect reports of these cases and put them in the test suite?

@ghost
Copy link
Author

ghost commented Nov 5, 2017

@jreback I'll probably also do this with my mild mannered private individual role which I'm tagging @kghose

@jreback
Copy link
Contributor

jreback commented Nov 5, 2017

yep those are the tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request MultiIndex
Projects
None yet
Development

No branches or pull requests

2 participants