Skip to content

Setting values on slice of multi-index gives NaNs #10440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jim22k opened this issue Jun 25, 2015 · 8 comments
Open

Setting values on slice of multi-index gives NaNs #10440

jim22k opened this issue Jun 25, 2015 · 8 comments
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@jim22k
Copy link

jim22k commented Jun 25, 2015

Best shown with an example.

import numpy as np, pandas as pd
timestamps = map(pd.Timestamp, ['2014-01-01', '2014-02-01'])
categories = ['A', 'B', 'C', 'D']
df = pd.DataFrame(index=pd.MultiIndex.from_product([timestamps, categories], names=['ts', 'cat']),
                  columns=['Col1', 'Col2'])

>>> df
                Col1  Col2
ts         cat            
2014-01-01 A     NaN   NaN
           B     NaN   NaN
           C     NaN   NaN
           D     NaN   NaN
2014-02-01 A     NaN   NaN
           B     NaN   NaN
           C     NaN   NaN
           D     NaN   NaN

I want to set the values for all categories in a single month. These examples work just fine.

df.loc['2014-01-01', 'Col1'] = 5
df.loc['2014-01-01', 'Col2'] = [1,2,3,4]

>>> df
               Col1 Col2
ts         cat          
2014-01-01 A      5    1
           B      5    2
           C      5    3
           D      5    4
2014-02-01 A    NaN  NaN
           B    NaN  NaN
           C    NaN  NaN
           D    NaN  NaN

These examples don't work.

df.loc['2014-01-01', 'Col1'] += 1
df.loc['2014-02-01', 'Col2'] = df.loc['2014-01-01', 'Col2']

>>> df
               Col1 Col2
ts         cat          
2014-01-01 A    NaN    1
           B    NaN    2
           C    NaN    3
           D    NaN    4
2014-02-01 A    NaN  NaN
           B    NaN  NaN
           C    NaN  NaN
           D    NaN  NaN

It doesn't seem to be a "setting a value on a copy" issue. Instead, Pandas is writing the NaNs.

My current workaround is to unstack each column into a DataFrame with simple indexes. This works, but I have lots of columns to work with. One dataframe is much easier to work with than a pile of dataframes.

The computations for each month depend on the values computed in the previous month, hence why it can't be done fully vectorized on an entire column.

@jreback
Copy link
Contributor

jreback commented Jun 25, 2015

see docs here

In [17]: idx = pd.IndexSlice

# this is multi-index slicing
# this will *always* work
In [18]: df.loc[idx['2014-01-01',:], 'Col1'] += 1

In [19]: df
Out[19]: 
               Col1 Col2
ts         cat          
2014-01-01 A      6    1
           B      6    2
           C      6    3
           D      6    4
2014-02-01 A    NaN  NaN
           B    NaN  NaN
           C    NaN  NaN
           D    NaN  NaN

# pandas automatically aligns the labels for you, so you need to tell it to just shove these values in (as the ``.values`` makes it a numpy array w/o the labels)
In [22]: df.loc[idx['2014-02-01',:], 'Col2'] = df.loc[idx['2014-01-01',:], 'Col2'].values

In [23]: df
Out[23]: 
               Col1  Col2
ts         cat           
2014-01-01 A      6     1
           B      6     2
           C      6     3
           D      6     4
2014-02-01 A    NaN     1
           B    NaN     2
           C    NaN     3
           D    NaN     4

@jreback
Copy link
Contributor

jreback commented Jun 25, 2015

I'll mark this as a api-issue. I think that the 'shortcuts' about prob could work, they just may not be hooked up exactly right.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves API Design MultiIndex labels Jun 25, 2015
@jreback jreback added this to the 0.17.0 milestone Jun 25, 2015
@jreback
Copy link
Contributor

jreback commented Jun 25, 2015

a possibility here would be to allow this:

df.loc(align=False)[idx['2014-02-01',:], 'Col2'] = df.loc[idx['2014-01-01',:], 'Col2']

IOW a keyword that would essentially do .values on the rhs and forgo alignment

@jim22k
Copy link
Author

jim22k commented Jun 26, 2015

df.loc(align=False)[] looks really strange to me. Using .values on the rhs works fine.

If I want to avoid dropping into implicit alignment via numpy arrays, I found a way.

rhs = df.loc[idx['2014-01-01', :], 'Col2']
rhs.index.set_levels([pd.Timestamp('2014-02-01')], level=0, inplace=True)
df.loc[idx['2014-02-01', :], 'Col2'] = rhs

It's a little verbose, but functional. Between that and .values, I have two working solutions.

Thanks for the tip about idx = pd.IndexSlice. That makes working with multi-indexes much nicer.

@jreback jreback modified the milestones: Next Major Release, 0.17.0 Aug 20, 2015
@adamdivak
Copy link

I just ran into the same issue and was about to report it when I found it has already been reported. I think this is a bug and very inconsistent, I see no reason why some of the assignments below should result in NaNs.

df = pd.DataFrame([('a', 0, 1), ('a', 1, 2),
                   ('b', 0, 1), ('b', 1, 2)], 
                  columns=['c1', 'c2', 'c3'])
df = df.set_index(['c1', 'c2'])
df = df.sortlevel(0)
print(df)

           c3
    c1 c2    
    a  0    1
       1    2
    b  0    1
       1    2
df.loc['a'] = 1
print(df.loc['a'])

        c3
    c2    
    0    1
    1    1
df.loc['b'] -= 1
print(df.loc['b'])

        c3
    c2    
    0  NaN
    1  NaN
df.loc['a'] = df.loc['b'].values
print(df.loc['a'])

        c3
    c2    
    0    1
    1    2
df.loc['a'] = df.loc['b']
print(df.loc['a'])

        c3
    c2    
    0  NaN
    1  NaN

@jreback
Copy link
Contributor

jreback commented Mar 21, 2016

@yosuah you realize this shouldn't work, right. the point of pandas is to ALIGN data. This is correct as these are different columns, they don't align.

the only issue here is the inplace arithmetic.

df.loc['a'] = df.loc['b']
print(df.loc['a'])

        c3
    c2    
    0  NaN
    1  NaN

@Suprabat
Copy link

Suprabat commented Apr 3, 2021

# pandas automatically aligns the labels for you, so you need to tell it to just shove these values in (as the ``.values`` makes it a numpy array w/o the labels)
In [22]: df.loc[idx['2014-02-01',:], 'Col2'] = df.loc[idx['2014-01-01',:], 'Col2'].values

In [23]: df
Out[23]: 
               Col1  Col2
ts         cat           
2014-01-01 A      6     1
           B      6     2
           C      6     3
           D      6     4
2014-02-01 A    NaN     1
           B    NaN     2
           C    NaN     3
           D    NaN     4

Hi, @jreback I'd like to understand how the DataFrame knows to align the two sides. Please see the following example, where I deliberately scrambled the level1 index of 'b'. When I use .values, the labels are ignored and hence does not achieve the result I want.

In [5]: df = pd.DataFrame(
   ...:     np.array([[1, 1, 1, 0, 0],
   ...:               [1, 0, 0, 0, 1],
   ...:               [1, 0, 0, 0, 0],
   ...:               [1, 0, 0, 1, 1],
   ...:               [0, 0, 0, 0, 1],
   ...:               [0, 1, 1, 0, 1]]),
   ...:     index = pd.MultiIndex.from_tuples(
   ...:         [('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 3), ('b', 2)],
   ...:         names = ['lvl0', 'lvl1']
   ...:     )
   ...: )
   ...: df
Out[5]: 
           0  1  2  3  4
lvl0 lvl1               
a    1     1  1  1  0  0
     2     1  0  0  0  1
     3     1  0  0  0  0
b    1     1  0  0  1  1
     3     0  0  0  0  1
     2     0  1  1  0  1

In [6]: idx = pd.IndexSlice

In [7]: df2 = df.copy()
   ...: df2.loc[idx['a',:]] = df2.loc[idx['b',:]].values
   ...: df2
Out[7]: 
           0  1  2  3  4
lvl0 lvl1               
a    1     1  0  0  1  1
     2     0  0  0  0  1
     3     0  1  1  0  1
b    1     1  0  0  1  1
     3     0  0  0  0  1
     2     0  1  1  0  1

Expected output:

           0  1  2  3  4
lvl0 lvl1               
a    1     1  0  0  1  1
     2     0  1  1  0  1
     3     0  0  0  0  1
b    1     1  0  0  1  1
     3     0  0  0  0  1
     2     0  1  1  0  1

@nylocx
Copy link

nylocx commented Aug 31, 2021

This is still relevant as it just bit me. Here is my minimal example that looks ok but screws up the data:

df = pd.DataFrame(
    [[1, 2, 3], [4, 5, 6], [np.NaN, np.NaN, np.NaN]],
    columns=pd.MultiIndex.from_tuples(
        [("a", 1,), ("a", 2), ("b", 1)]
    ),
)
df.loc[:, "a"] = df.loc[:, "a"].fillna(9)
df

Messed up columns ("a", 1) and ("a", 2)

	a	        b
        1	2	1
0	NaN	NaN	3.0
1	NaN	NaN	6.0
2	NaN	NaN	NaN

if I just add in a values, everything is working fine:

df = pd.DataFrame(
    [[1, 2, 3], [4, 5, 6], [np.NaN, np.NaN, np.NaN]],
    columns=pd.MultiIndex.from_tuples(
        [("a", 1,), ("a", 2), ("b", 1)]
    ),
)
df.loc[:, "a"] = df.loc[:, "a"].fillna(9).values
df

Looks like it should, na values are filled in columns ("a", 1) and ("a", 2)

	a	        b
        1	2	1
0	1.0	2.0	3.0
1	4.0	5.0	6.0
2	9.0	9.0	NaN

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

7 participants