Setting values on slice of multi-index gives NaNs #10440

jim22k · 2015-06-25T17:11:59Z

Best shown with an example.

import numpy as np, pandas as pd
timestamps = map(pd.Timestamp, ['2014-01-01', '2014-02-01'])
categories = ['A', 'B', 'C', 'D']
df = pd.DataFrame(index=pd.MultiIndex.from_product([timestamps, categories], names=['ts', 'cat']),
                  columns=['Col1', 'Col2'])

>>> df
                Col1  Col2
ts         cat            
2014-01-01 A     NaN   NaN
           B     NaN   NaN
           C     NaN   NaN
           D     NaN   NaN
2014-02-01 A     NaN   NaN
           B     NaN   NaN
           C     NaN   NaN
           D     NaN   NaN

I want to set the values for all categories in a single month. These examples work just fine.

df.loc['2014-01-01', 'Col1'] = 5
df.loc['2014-01-01', 'Col2'] = [1,2,3,4]

>>> df
               Col1 Col2
ts         cat          
2014-01-01 A      5    1
           B      5    2
           C      5    3
           D      5    4
2014-02-01 A    NaN  NaN
           B    NaN  NaN
           C    NaN  NaN
           D    NaN  NaN

These examples don't work.

df.loc['2014-01-01', 'Col1'] += 1
df.loc['2014-02-01', 'Col2'] = df.loc['2014-01-01', 'Col2']

>>> df
               Col1 Col2
ts         cat          
2014-01-01 A    NaN    1
           B    NaN    2
           C    NaN    3
           D    NaN    4
2014-02-01 A    NaN  NaN
           B    NaN  NaN
           C    NaN  NaN
           D    NaN  NaN

It doesn't seem to be a "setting a value on a copy" issue. Instead, Pandas is writing the NaNs.

My current workaround is to unstack each column into a DataFrame with simple indexes. This works, but I have lots of columns to work with. One dataframe is much easier to work with than a pile of dataframes.

The computations for each month depend on the values computed in the previous month, hence why it can't be done fully vectorized on an entire column.

jreback · 2015-06-25T19:17:07Z

see docs here

In [17]: idx = pd.IndexSlice

# this is multi-index slicing
# this will *always* work
In [18]: df.loc[idx['2014-01-01',:], 'Col1'] += 1

In [19]: df
Out[19]: 
               Col1 Col2
ts         cat          
2014-01-01 A      6    1
           B      6    2
           C      6    3
           D      6    4
2014-02-01 A    NaN  NaN
           B    NaN  NaN
           C    NaN  NaN
           D    NaN  NaN

# pandas automatically aligns the labels for you, so you need to tell it to just shove these values in (as the ``.values`` makes it a numpy array w/o the labels)
In [22]: df.loc[idx['2014-02-01',:], 'Col2'] = df.loc[idx['2014-01-01',:], 'Col2'].values

In [23]: df
Out[23]: 
               Col1  Col2
ts         cat           
2014-01-01 A      6     1
           B      6     2
           C      6     3
           D      6     4
2014-02-01 A    NaN     1
           B    NaN     2
           C    NaN     3
           D    NaN     4

jreback · 2015-06-25T19:19:05Z

I'll mark this as a api-issue. I think that the 'shortcuts' about prob could work, they just may not be hooked up exactly right.

jreback · 2015-06-25T21:23:00Z

a possibility here would be to allow this:

df.loc(align=False)[idx['2014-02-01',:], 'Col2'] = df.loc[idx['2014-01-01',:], 'Col2']

IOW a keyword that would essentially do .values on the rhs and forgo alignment

jim22k · 2015-06-26T13:01:16Z

df.loc(align=False)[] looks really strange to me. Using .values on the rhs works fine.

If I want to avoid dropping into implicit alignment via numpy arrays, I found a way.

rhs = df.loc[idx['2014-01-01', :], 'Col2']
rhs.index.set_levels([pd.Timestamp('2014-02-01')], level=0, inplace=True)
df.loc[idx['2014-02-01', :], 'Col2'] = rhs

It's a little verbose, but functional. Between that and .values, I have two working solutions.

Thanks for the tip about idx = pd.IndexSlice. That makes working with multi-indexes much nicer.

adamdivak · 2016-03-21T14:11:38Z

I just ran into the same issue and was about to report it when I found it has already been reported. I think this is a bug and very inconsistent, I see no reason why some of the assignments below should result in NaNs.

df = pd.DataFrame([('a', 0, 1), ('a', 1, 2),
                   ('b', 0, 1), ('b', 1, 2)], 
                  columns=['c1', 'c2', 'c3'])
df = df.set_index(['c1', 'c2'])
df = df.sortlevel(0)
print(df)

           c3
    c1 c2    
    a  0    1
       1    2
    b  0    1
       1    2

df.loc['a'] = 1
print(df.loc['a'])

        c3
    c2    
    0    1
    1    1

df.loc['b'] -= 1
print(df.loc['b'])

        c3
    c2    
    0  NaN
    1  NaN

df.loc['a'] = df.loc['b'].values
print(df.loc['a'])

        c3
    c2    
    0    1
    1    2

df.loc['a'] = df.loc['b']
print(df.loc['a'])

        c3
    c2    
    0  NaN
    1  NaN

jreback · 2016-03-21T14:21:18Z

@yosuah you realize this shouldn't work, right. the point of pandas is to ALIGN data. This is correct as these are different columns, they don't align.

the only issue here is the inplace arithmetic.

df.loc['a'] = df.loc['b']
print(df.loc['a'])

        c3
    c2    
    0  NaN
    1  NaN

Suprabat · 2021-04-03T01:05:45Z

# pandas automatically aligns the labels for you, so you need to tell it to just shove these values in (as the ``.values`` makes it a numpy array w/o the labels)
In [22]: df.loc[idx['2014-02-01',:], 'Col2'] = df.loc[idx['2014-01-01',:], 'Col2'].values

In [23]: df
Out[23]: 
               Col1  Col2
ts         cat           
2014-01-01 A      6     1
           B      6     2
           C      6     3
           D      6     4
2014-02-01 A    NaN     1
           B    NaN     2
           C    NaN     3
           D    NaN     4

Hi, @jreback I'd like to understand how the DataFrame knows to align the two sides. Please see the following example, where I deliberately scrambled the level1 index of 'b'. When I use .values, the labels are ignored and hence does not achieve the result I want.

In [5]: df = pd.DataFrame(
   ...:     np.array([[1, 1, 1, 0, 0],
   ...:               [1, 0, 0, 0, 1],
   ...:               [1, 0, 0, 0, 0],
   ...:               [1, 0, 0, 1, 1],
   ...:               [0, 0, 0, 0, 1],
   ...:               [0, 1, 1, 0, 1]]),
   ...:     index = pd.MultiIndex.from_tuples(
   ...:         [('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 3), ('b', 2)],
   ...:         names = ['lvl0', 'lvl1']
   ...:     )
   ...: )
   ...: df
Out[5]: 
           0  1  2  3  4
lvl0 lvl1               
a    1     1  1  1  0  0
     2     1  0  0  0  1
     3     1  0  0  0  0
b    1     1  0  0  1  1
     3     0  0  0  0  1
     2     0  1  1  0  1

In [6]: idx = pd.IndexSlice

In [7]: df2 = df.copy()
   ...: df2.loc[idx['a',:]] = df2.loc[idx['b',:]].values
   ...: df2
Out[7]: 
           0  1  2  3  4
lvl0 lvl1               
a    1     1  0  0  1  1
     2     0  0  0  0  1
     3     0  1  1  0  1
b    1     1  0  0  1  1
     3     0  0  0  0  1
     2     0  1  1  0  1

Expected output:

           0  1  2  3  4
lvl0 lvl1               
a    1     1  0  0  1  1
     2     0  1  1  0  1
     3     0  0  0  0  1
b    1     1  0  0  1  1
     3     0  0  0  0  1
     2     0  1  1  0  1

nylocx · 2021-08-31T14:10:09Z

This is still relevant as it just bit me. Here is my minimal example that looks ok but screws up the data:

df = pd.DataFrame(
    [[1, 2, 3], [4, 5, 6], [np.NaN, np.NaN, np.NaN]],
    columns=pd.MultiIndex.from_tuples(
        [("a", 1,), ("a", 2), ("b", 1)]
    ),
)
df.loc[:, "a"] = df.loc[:, "a"].fillna(9)
df

Messed up columns ("a", 1) and ("a", 2)

	a	        b
        1	2	1
0	NaN	NaN	3.0
1	NaN	NaN	6.0
2	NaN	NaN	NaN

if I just add in a values, everything is working fine:

df = pd.DataFrame(
    [[1, 2, 3], [4, 5, 6], [np.NaN, np.NaN, np.NaN]],
    columns=pd.MultiIndex.from_tuples(
        [("a", 1,), ("a", 2), ("b", 1)]
    ),
)
df.loc[:, "a"] = df.loc[:, "a"].fillna(9).values
df

Looks like it should, na values are filled in columns ("a", 1) and ("a", 2)

	a	        b
        1	2	1
0	1.0	2.0	3.0
1	4.0	5.0	6.0
2	9.0	9.0	NaN

jreback added Indexing Related to indexing on series/frames, not to indexes themselves API Design MultiIndex labels Jun 25, 2015

jreback added this to the 0.17.0 milestone Jun 25, 2015

jreback modified the milestones: Next Major Release, 0.17.0 Aug 20, 2015

jreback added Prio-low labels Aug 20, 2015

jreback mentioned this issue Feb 5, 2017

Unexpected Behavior when Setting partial row in MultiIndex-columned-Dataframe with Series #15310

Open

jreback mentioned this issue May 17, 2017

Assignment broken for sliced multi-indices #16379

Closed

This was referenced Nov 5, 2017

Assignment of subpart of data frame with multi index yields user surprise #18120

Closed

Setting values with multiindex .loc #18223

Closed

Dr-Irv mentioned this issue Mar 19, 2018

API: Inconsistent behavior with setting slices of Series indexed by MultiIndex #20414

Closed

WillAyd mentioned this issue Mar 7, 2019

str.extract output assignment behavior inconsisent and broken in some cases #25582

Closed

flyjuice mentioned this issue Aug 6, 2019

Multilevel Column Index Assignment Doesn't Work #27776

Closed

jbrockmendel removed Difficulty Intermediate labels Oct 21, 2019

sbitzer mentioned this issue Jun 5, 2020

BUG: dangerous inconsistency when setting values on slice of multi-index #34603

Closed

1 task

phofl mentioned this issue Nov 29, 2020

DataFrame.loc multiple columns replace #30439

Open

mroeschke added Enhancement and removed API Design labels Apr 18, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting values on slice of multi-index gives NaNs #10440

Setting values on slice of multi-index gives NaNs #10440

jim22k commented Jun 25, 2015

jreback commented Jun 25, 2015

jreback commented Jun 25, 2015

jreback commented Jun 25, 2015

jim22k commented Jun 26, 2015

adamdivak commented Mar 21, 2016

jreback commented Mar 21, 2016

Suprabat commented Apr 3, 2021

nylocx commented Aug 31, 2021 •

edited

Loading

Setting values on slice of multi-index gives NaNs #10440

Setting values on slice of multi-index gives NaNs #10440

Comments

jim22k commented Jun 25, 2015

jreback commented Jun 25, 2015

jreback commented Jun 25, 2015

jreback commented Jun 25, 2015

jim22k commented Jun 26, 2015

adamdivak commented Mar 21, 2016

jreback commented Mar 21, 2016

Suprabat commented Apr 3, 2021

nylocx commented Aug 31, 2021 • edited Loading

nylocx commented Aug 31, 2021 •

edited

Loading