Unexpected Behavior when Setting partial row in MultiIndex-columned-Dataframe with Series #15310

joseortiz3 · 2017-02-05T20:22:51Z

Here I demonstrate that the expected behavior of creating a pandas Series object, and using it to safely set values in a Multi-Index-Columned Dataframe does not work with the most ideal syntax, and that currently the only way to do this is needlessly cumbersome. The motivation behind using a series to set values in the dataframe is that this method uses the 'key:value' structure of the Series to set values in the Dataframe, which allows one to ignore the order of the 'keys', or even if 'keys' fully spans the second level of the Dataframe's column's multiIndex.

Edit: In other words, I want to use a dictionary-like object (such as Series, or anything with a 'key:value' structure) to safely set some values in a row of df, without worrying about the order of the values or how many of them there are.

Code Sample

>>> import pandas as pd
>>> multi_index = pd.MultiIndex.from_product([['Bob','Jon'],['hours','sales']])
>>> df = pd.DataFrame(index = pd.Index([0,1,2],name='day'), columns = multi_index)
>>> # Our dataframe's index and column start out sorted, so don't worry about that.
>>> df
      Bob         Jon      
    hours sales hours sales
day                        
0     NaN   NaN   NaN   NaN
1     NaN   NaN   NaN   NaN
2     NaN   NaN   NaN   NaN
>>> # We make a row...
>>> row = pd.Series({'sales':635.54,'hours':8})
>>>  #...that we want to insert here.
>>> df.loc[0,('Bob',list(row.index))]
Bob  hours    NaN
     sales    NaN
Name: 0, dtype: object
>>> # We try inserting that row in the specified position...
>>> df.loc[0,('Bob',list(row.index))] = row
>>> #...But nothing happens.
>>> df
      Bob         Jon      
    hours sales hours sales
day                        
0     NaN   NaN   NaN   NaN
1     NaN   NaN   NaN   NaN
2     NaN   NaN   NaN   NaN
>>> #If we insert just the values of the row, it works.
>>> df.loc[0,('Bob',list(row.index))] = row.values
>>> df
      Bob           Jon      
    hours   sales hours sales
day                          
0       8  635.54   NaN   NaN
1     NaN     NaN   NaN   NaN
2     NaN     NaN   NaN   NaN
>>> # Why? We see that the index of the row in the dataframe...
>>> df.loc[0,('Bob',list(row.index))].index
MultiIndex(levels=[[u'Bob', u'Jon'], [u'hours', u'sales']],
           labels=[[0, 0], [0, 1]])
>>> #...and the index of the Series object do not match
>>> row.index.
Index([u'hours', u'sales'], dtype='object')
>>> #If we make them match...
>>> row.index = df.loc[0,('Bob',list(row.index))].index
>>> #...Now it works flawlessly (with a different syntax)
>>> df.loc[0,row.index] = row
>>> df
      Bob           Jon      
    hours   sales hours sales
day                          
0       8  635.54   NaN   NaN
1     NaN     NaN   NaN   NaN
2     NaN     NaN   NaN   NaN

Problem description

Setting a few values in a sub-row of a Multi-Index-Columned Dataframe with a Series object should work even if the indices of the Dataframe sub-row and the Series object do not match, when there is no ambiguous way the assignment could be interpreted, like in the example.

In general, the only sorts of objects that currently can be assigned with any method to a sub-row of a MultiIndexed Dataframe are objects which either have no indices (like tuples or arrays), or have MultiIndex indices.

Expected Output

>>> #We see that the index of the row in the dataframe...
>>> df.loc[0,('Bob',list(row.index))].index
MultiIndex(levels=[[u'Bob', u'Jon'], [u'hours', u'sales']],
           labels=[[0, 0], [0, 1]])
>>> #...and the index of the Series object do not match...
>>> row.index
Index([u'hours', u'sales'], dtype='object')
>>> # But pandas has enough information to execute the assignment
>>> df.loc[0,('Bob',list(row.index))] = row #Works
>>> Even more Ideally with the simpler syntax
>>> df.loc[0,('Bob',row.index)] = row

If there is another method that works better than this, please also tell me. But it seems to me this should just work.

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 8.1.2
setuptools: 20.7.0
Cython: 0.24
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.5
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: 0.2.1

jreback · 2017-02-05T20:33:01Z

well you can do this

In [8]: df.loc[0, 'Bob'] = [1, 2]

In [9]: df
Out[9]: 
      Bob         Jon      
    hours sales hours sales
day                        
0       1     2   NaN   NaN
1     NaN   NaN   NaN   NaN
2     NaN   NaN   NaN   NaN

In [11]: df.loc[0, [('Bob', 'hours'), ('Bob', 'sales')]] = [1, 2]

In [12]: df
Out[12]: 
      Bob         Jon      
    hours sales hours sales
day                        
0       1     2   NaN   NaN
1     NaN   NaN   NaN   NaN
2     NaN   NaN   NaN   NaN

Your question are confusing, but I think you want to do this, yes?

In [14]: df.loc[0, 'Bob'] = Series([1, 2], ['hours', 'sales'])

In [15]: df
Out[15]: 
      Bob         Jon      
    hours sales hours sales
day                        
0     NaN   NaN   NaN   NaN
1     NaN   NaN   NaN   NaN
2     NaN   NaN   NaN   NaN

joseortiz3 · 2017-02-05T20:36:51Z

Yes, that's right. I want to use a dictionary-like object (such as Series, or anything with a 'key:value' structure) to safely set some values in a row of df, without worrying about the order of the values or how many of them there are.

Sorry it's so confusing.

I think the main issue, or a big clue, is that your series has to have a MultiIndex for it to work.

>>> good_series_1.index # Ready to set for 'Bob'
MultiIndex(levels=[[u'Bob', u'Jon'], [u'hours', u'sales']],
           labels=[[0, 0], [0, 1]])
>>> good_series_2.index
MultiIndex(levels=[[u'Bob'], [u'hours', u'sales']],
           labels=[[0, 0], [1, 0]])

Otherwise, any way of setting values will result in NaN. So the bug is perhaps the inability to match Index and MultiIndex. Right now, the best working syntax I found is

>>> s = pd.Series([101.34,2],index=pd.MultiIndex.from_product([['Bob'],['sales','hours']]))
>>> df[1] = s

But weirdly if the index is a DateTimeIndex and I want to insert it at the end, sometimes the above syntax destroys all the other data, and I have to make sure I do the super-redundant

df[last_time_in_index,'Bob'] = s #Even though s, above, already has "Bob"

The setting functionality of multiindex dataframes is clearly buggy.

jreback · 2017-02-05T20:39:46Z

#10440

I recall an issue about this, but sometimes hard to find things. Ok In theory this can work, but the code is a bit messy ATM. You are welcome to dive in and have a look. IIRC this has been this way for a long time.

jorisvandenbossche · 2017-02-05T22:48:43Z

You also get this strange effect where assigning itself introduces NaNs:

In [39]: multi_index = pd.MultiIndex.from_product([['Bob','Jon'],['hours','sales']])
    ...: df = pd.DataFrame(0, index = pd.Index([0,1,2],name='day'), columns = multi_index)
    ...: 

In [40]: df
Out[40]: 
      Bob         Jon      
    hours sales hours sales
day                        
0       0     0     0     0
1       0     0     0     0
2       0     0     0     0

In [41]: df.loc[0, 'Bob']
Out[41]: 
hours    0
sales    0
Name: 0, dtype: int64

In [42]: df.loc[0, 'Bob'] = df.loc[0, 'Bob'] + 1  ## or even without the + 1

In [43]: df
Out[43]: 
      Bob         Jon      
    hours sales hours sales
day                        
0     NaN   NaN     0     0
1     0.0   0.0     0     0
2     0.0   0.0     0     0

But, I think the original example of @joseortiz3 should not work? (so not to be considered as a bug)
Because I think we should expect the index of the assigned value to match with the left-hand-sides index. In the example above, however, the index matches.

joseortiz3 · 2017-02-05T23:53:58Z

@jorisvandenbossche It might not be considered a bug, but just to clarify I think it should work, because there is no ambiguous way that instruction could be interpreted.

Note that in your example, the assignment only works if the thing on the Right Hand Side has a MultiIndex (even if it doesn't match the index of thing on the left)

>>> df.loc[0,'Bob'] = df.loc[0,pd.IndexSlice['Bob',:]]+1
>>> df
    Bob         Jon      
  hours sales hours sales
0     1     1     0     0
1     0     0     0     0
2     0     0     0     0
>>> df.loc[0,pd.IndexSlice['Bob',:]].index     # RHS index
MultiIndex(levels=[[u'Bob', u'Jon'], [u'hours', u'sales']],
           labels=[[0, 0], [0, 1]])
>>> df.loc[0,'Bob'].index                       # LHS index
Index([u'hours', u'sales'], dtype='object')

for assigning to Dataframes with a MultiIndex axis
Working:

Index <- Multi Index
MultiIndex <- MultiIndex

Not Working:

Index <- Index
MultiIndex <- Index

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Feb 5, 2017

jreback added this to the 0.20.0 milestone Feb 5, 2017

jreback added Difficulty Intermediate labels Feb 5, 2017

jreback modified the milestones: 0.21.0, 0.20.0 Mar 23, 2017

jreback modified the milestones: 0.21.0, Next Major Release Sep 23, 2017

jbrockmendel removed Difficulty Intermediate labels Oct 21, 2019

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected Behavior when Setting partial row in MultiIndex-columned-Dataframe with Series #15310

Unexpected Behavior when Setting partial row in MultiIndex-columned-Dataframe with Series #15310

joseortiz3 commented Feb 5, 2017 •

edited

Loading

jreback commented Feb 5, 2017

joseortiz3 commented Feb 5, 2017 •

edited

Loading

jreback commented Feb 5, 2017

jorisvandenbossche commented Feb 5, 2017

joseortiz3 commented Feb 5, 2017 •

edited

Loading

Unexpected Behavior when Setting partial row in MultiIndex-columned-Dataframe with Series #15310

Unexpected Behavior when Setting partial row in MultiIndex-columned-Dataframe with Series #15310

Comments

joseortiz3 commented Feb 5, 2017 • edited Loading

Code Sample

Problem description

Expected Output

jreback commented Feb 5, 2017

joseortiz3 commented Feb 5, 2017 • edited Loading

jreback commented Feb 5, 2017

jorisvandenbossche commented Feb 5, 2017

joseortiz3 commented Feb 5, 2017 • edited Loading

joseortiz3 commented Feb 5, 2017 •

edited

Loading

joseortiz3 commented Feb 5, 2017 •

edited

Loading

joseortiz3 commented Feb 5, 2017 •

edited

Loading