-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Setting values on slice of multi-index gives NaNs #10440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
see docs here
|
I'll mark this as a api-issue. I think that the 'shortcuts' about prob could work, they just may not be hooked up exactly right. |
a possibility here would be to allow this:
IOW a keyword that would essentially do |
If I want to avoid dropping into implicit alignment via numpy arrays, I found a way. rhs = df.loc[idx['2014-01-01', :], 'Col2']
rhs.index.set_levels([pd.Timestamp('2014-02-01')], level=0, inplace=True)
df.loc[idx['2014-02-01', :], 'Col2'] = rhs It's a little verbose, but functional. Between that and Thanks for the tip about |
I just ran into the same issue and was about to report it when I found it has already been reported. I think this is a bug and very inconsistent, I see no reason why some of the assignments below should result in NaNs. df = pd.DataFrame([('a', 0, 1), ('a', 1, 2),
('b', 0, 1), ('b', 1, 2)],
columns=['c1', 'c2', 'c3'])
df = df.set_index(['c1', 'c2'])
df = df.sortlevel(0)
print(df)
c3
c1 c2
a 0 1
1 2
b 0 1
1 2 df.loc['a'] = 1
print(df.loc['a'])
c3
c2
0 1
1 1 df.loc['b'] -= 1
print(df.loc['b'])
c3
c2
0 NaN
1 NaN df.loc['a'] = df.loc['b'].values
print(df.loc['a'])
c3
c2
0 1
1 2 df.loc['a'] = df.loc['b']
print(df.loc['a'])
c3
c2
0 NaN
1 NaN |
@yosuah you realize this shouldn't work, right. the point of pandas is to ALIGN data. This is correct as these are different columns, they don't align. the only issue here is the inplace arithmetic.
|
Hi, @jreback I'd like to understand how the DataFrame knows to align the two sides. Please see the following example, where I deliberately scrambled the level1 index of 'b'. When I use .values, the labels are ignored and hence does not achieve the result I want.
Expected output:
|
This is still relevant as it just bit me. Here is my minimal example that looks ok but screws up the data: df = pd.DataFrame(
[[1, 2, 3], [4, 5, 6], [np.NaN, np.NaN, np.NaN]],
columns=pd.MultiIndex.from_tuples(
[("a", 1,), ("a", 2), ("b", 1)]
),
)
df.loc[:, "a"] = df.loc[:, "a"].fillna(9)
df Messed up columns ("a", 1) and ("a", 2)
if I just add in a values, everything is working fine: df = pd.DataFrame(
[[1, 2, 3], [4, 5, 6], [np.NaN, np.NaN, np.NaN]],
columns=pd.MultiIndex.from_tuples(
[("a", 1,), ("a", 2), ("b", 1)]
),
)
df.loc[:, "a"] = df.loc[:, "a"].fillna(9).values
df Looks like it should, na values are filled in columns ("a", 1) and ("a", 2)
|
Best shown with an example.
I want to set the values for all categories in a single month. These examples work just fine.
These examples don't work.
It doesn't seem to be a "setting a value on a copy" issue. Instead, Pandas is writing the NaNs.
My current workaround is to unstack each column into a DataFrame with simple indexes. This works, but I have lots of columns to work with. One dataframe is much easier to work with than a pile of dataframes.
The computations for each month depend on the values computed in the previous month, hence why it can't be done fully vectorized on an entire column.
The text was updated successfully, but these errors were encountered: