Skip to content

Assigning from other NDFrame broken on multiple MultiIndex columns #12313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
toobaz opened this issue Feb 12, 2016 · 5 comments
Closed

Assigning from other NDFrame broken on multiple MultiIndex columns #12313

toobaz opened this issue Feb 12, 2016 · 5 comments

Comments

@toobaz
Copy link
Member

toobaz commented Feb 12, 2016

In [2]: df = pd.DataFrame(columns=pd.MultiIndex.from_product([range(2), range(2)]))

In [3]: df.loc[0] = range(4)

In [4]: df.loc[1] = pd.Series(range(4))

In [5]: df
Out[5]: 
     0         1     
     0    1    0    1
0  0.0  1.0  2.0  3.0
1  NaN  NaN  NaN  NaN

That is: the first loc is doing what we expect, but the second is only creating a row of NaNs. By the way, giving an explicit columns list does not help, and

In [6]: df.loc[2] = pd.Series(range(40))

does not even raise an error.

@jreback
Copy link
Contributor

jreback commented Feb 12, 2016

this is correct. you are reindexing in both cases and you don't have any matches.
hHe reason [3] works is that it does have any indices.

In [17]: s = pd.Series(range(4))

In [18]: s
Out[18]: 
0    0
1    1
2    2
3    3
dtype: int64

In [19]: s.reindex(df.columns)
Out[19]: 
0  0   NaN
   1   NaN
1  0   NaN
   1   NaN
dtype: float64

@jreback jreback closed this as completed Feb 12, 2016
@toobaz
Copy link
Member Author

toobaz commented Feb 12, 2016

Ouch!

@toobaz
Copy link
Member Author

toobaz commented Feb 12, 2016

While my example was remarkably stupid, it came from an oversimplification of

In [2]: df = pd.DataFrame(columns=pd.MultiIndex.from_product([range(2), range(2)]))

In [3]: df.loc[2,0] = pd.Series(range(2), index=range(2))

In [4]: df
Out[4]: 
     0         1     
     0    1    0    1
2  NaN  NaN  NaN  NaN

... which I assumed should work, since

In [5]: df = pd.DataFrame(columns=range(2))

In [6]: df.loc[2] = pd.Series(range(2), index=range(2))

In [7]: df
Out[7]: 
     0    1
2  0.0  1.0

does. What do you think?

@jreback
Copy link
Contributor

jreback commented Feb 12, 2016

In [13]: df = pd.DataFrame(columns=pd.MultiIndex.from_product([range(2), range(2)]))

In [14]: df.loc[(2,0)] = pd.Series(range(2), index=range(2)).values

In [15]: df
Out[15]: 
   0       1     
   0  1    0    1
2  0  1  NaN  NaN

In [29]: df.loc[2] = pd.Series(range(2), index=MultiIndex.from_tuples([(0,0),(0,1)]))

In [30]: df
Out[30]: 
     0        1    
     0    1   0   1
2  0.0  1.0 NaN NaN

I suppose you could make an argument that

In [10]: idx = pd.IndexSlice

In [24]: df = pd.DataFrame(columns=pd.MultiIndex.from_product([range(2), range(2)]))

In [25]: df.loc[2,idx[0,:]] = pd.Series(range(2), index=range(2))

In [26]: df
Out[26]: 
     0         1     
     0    1    0    1
2  NaN  NaN  NaN  NaN

this should work. I don't think there is ANY support for this and would make an already very complicated interface even worse.

If you want to see what you can do great. But would need to make things simpler and not more complicated.

@jreback
Copy link
Contributor

jreback commented Feb 13, 2016

this is also a dupe of #6699

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants