-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix fillna on multi indexed Dataframe doesn't work #47774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
462134d
1bcdaef
5409961
764f0ab
f3d79a9
e384fd6
336f92d
79a542b
15db9ab
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -708,6 +708,34 @@ def test_single_block_df_with_horizontal_axis(self): | |
) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
def test_fillna_with_multi_index_frame(self): | ||
# GH 47649 | ||
pdf = DataFrame( | ||
{ | ||
("x", "a"): [np.nan, 2.0, 3.0, 4.0, np.nan, 6.0], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you simplify the DataFrame? 2 rows should be sufficient. Also please add a test with ea dtypes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, could you be more specific about what dtypes we want in the test? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Int64 for example |
||
("x", "b"): [1.0, 2.0, np.nan, 4.0, np.nan, np.nan], | ||
("y", "c"): [1.0, 2.0, 3.0, 4.0, np.nan, np.nan], | ||
} | ||
) | ||
expected = DataFrame( | ||
{ | ||
("x", "a"): [-1.0, 2.0, 3.0, 4.0, -1.0, 6.0], | ||
("x", "b"): [1.0, 2.0, -1.0, 4.0, -1.0, -1.0], | ||
("y", "c"): [1.0, 2.0, 3.0, 4.0, np.nan, np.nan], | ||
} | ||
) | ||
tm.assert_frame_equal(pdf.fillna({"x": -1}), expected) | ||
tm.assert_frame_equal(pdf.fillna({"x": -1, ("x", "b"): -2}), expected) | ||
|
||
expected = DataFrame( | ||
{ | ||
("x", "a"): [-1.0, 2.0, 3.0, 4.0, -1.0, 6.0], | ||
("x", "b"): [1.0, 2.0, -2.0, 4.0, -2.0, -2.0], | ||
("y", "c"): [1.0, 2.0, 3.0, 4.0, np.nan, np.nan], | ||
} | ||
) | ||
tm.assert_frame_equal(pdf.fillna({("x", "b"): -2, "x": -1}), expected) | ||
|
||
|
||
def test_fillna_nonconsolidated_frame(): | ||
# https://github.com/pandas-dev/pandas/issues/36495 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a change in behavior. I am not sure, if we want this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is probably not a good idea, I just tried to avoid unexpected
NAN
when setting values of a multi-indexedDataframe
, from my observation, the unexpectedNAN
may be caused byitem
(column index ("x", "a")) in line 1940 is not a column index ofvalue
(DataFrame
has columns index "a")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying root cause is, that getitem reduces the dimension of the MultiIndex. Not sure how to avoid this when setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. fixing the underlying issue is probably not an option for the regression fix, i.e. the issue that this PR closes listed in the OP.
A targeted PR for #47649 that could be backported would not address the underlying issue but could maybe make changes around the change made in #47327 to maybe convert the indexer to one that works #47649 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/pandas/core/generic.py
Lines 6677 to 6679 in a7c5773
I was making some changes by adding
.values
at end of line 6679, looks like that works well. Maybe we could ignore the index infillna
since we knowresult[k]
is part ofresult
. @simonjayhawkins