-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: GroupBy.ffill on a multi-index dataframe #55408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: GroupBy.ffill on a multi-index dataframe #55408
Conversation
This test checks whether the ffill method works correctly on a multi-index dataframe. In it, we define a dataframe with indexes [0, 1, 2, 0, 1, 2] and column "a" with values [1, 2, NaN, 3, 4, 5]. We group this dataframe by columns (level = 0) and shift it by one, so we have: 0: [1, 3] -> [NaN, 1] 1: [2, 4] -> [NaN, 2] 2: [NaN, 5] -> [NaN, NaN] Then, since index order remain the same, if we apply ffill method, it should give us a dataframe with column "a" equal to [NaN, NaN, NaN, 1, 2, 2]. Co-authored-by: José Lucas Silva Mayer <[email protected]> Co-authored-by: Willian Wang <[email protected]>
pandas/tests/groupby/test_groupby.py
Outdated
# GH#43412 | ||
df = DataFrame({"a": [1, 2, np.nan, 3, 4, 5]}, index=[0, 1, 2, 0, 1, 2]) | ||
|
||
result = df.groupby(level=0).shift().ffill() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the point of the original issue was to test groupby.ffill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
I'm testing this and when I use groupby(level=0)
and ffill()
methods of dataframe, it gives me an error of different types:
def test_groupby_ffill_with_duplicated_index():
# GH#43412
df = DataFrame({"a": [1, 2, 3, 4, np.nan, np.nan]}, index=[0, 1, 2, 0, 1, 2])
result = df.groupby(level=0).ffill()
expected = DataFrame(
{"a": [1, 2, 3, 4, 2, 3]}, index=[0, 1, 2, 0, 1, 2]
)
> tm.assert_frame_equal(result, expected)
E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
E
E Attribute "dtype" are different
E [left]: float64
E [right]: int64
Is this behaviour expected?
Anyways, I will write the test with a float64 type dataframe.
Signed-off-by: José Lucas Silva Mayer <[email protected]>
…h-duplicated-index
pandas/tests/groupby/test_groupby.py
Outdated
expected = DataFrame( | ||
{"a": [1, 2, 3, 4, 2, 3]}, index=[0, 1, 2, 0, 1, 2], dtype=float | ||
) | ||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can specify check_dtype=False
here instead of needing to specify float
above. i think the result should be int but can be investigated later
Signed-off-by: José Lucas Silva Mayer <[email protected]>
…h-duplicated-index
Thanks @josemayer |
For additional information, we have explained how this new automated test work on first commit description.