TST: GroupBy.ffill on a multi-index dataframe #55408

josemayer · 2023-10-04T20:14:53Z

For additional information, we have explained how this new automated test work on first commit description.

This test checks whether the ffill method works correctly on a multi-index dataframe. In it, we define a dataframe with indexes [0, 1, 2, 0, 1, 2] and column "a" with values [1, 2, NaN, 3, 4, 5]. We group this dataframe by columns (level = 0) and shift it by one, so we have: 0: [1, 3] -> [NaN, 1] 1: [2, 4] -> [NaN, 2] 2: [NaN, 5] -> [NaN, NaN] Then, since index order remain the same, if we apply ffill method, it should give us a dataframe with column "a" equal to [NaN, NaN, NaN, 1, 2, 2]. Co-authored-by: José Lucas Silva Mayer <[email protected]> Co-authored-by: Willian Wang <[email protected]>

mroeschke · 2023-10-04T22:43:21Z

pandas/tests/groupby/test_groupby.py

+    # GH#43412
+    df = DataFrame({"a": [1, 2, np.nan, 3, 4, 5]}, index=[0, 1, 2, 0, 1, 2])
+
+    result = df.groupby(level=0).shift().ffill()


I think the point of the original issue was to test groupby.ffill

Sure!

I'm testing this and when I use groupby(level=0) and ffill() methods of dataframe, it gives me an error of different types:

def test_groupby_ffill_with_duplicated_index(): # GH#43412 df = DataFrame({"a": [1, 2, 3, 4, np.nan, np.nan]}, index=[0, 1, 2, 0, 1, 2]) result = df.groupby(level=0).ffill() expected = DataFrame( {"a": [1, 2, 3, 4, 2, 3]}, index=[0, 1, 2, 0, 1, 2] )

> tm.assert_frame_equal(result, expected) E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different E E Attribute "dtype" are different E [left]: float64 E [right]: int64

Is this behaviour expected?

Anyways, I will write the test with a float64 type dataframe.

Signed-off-by: José Lucas Silva Mayer <[email protected]>

…h-duplicated-index

mroeschke · 2023-10-06T16:00:05Z

pandas/tests/groupby/test_groupby.py

+    expected = DataFrame(
+        {"a": [1, 2, 3, 4, 2, 3]}, index=[0, 1, 2, 0, 1, 2], dtype=float
+    )
+    tm.assert_frame_equal(result, expected)


You can specify check_dtype=False here instead of needing to specify float above. i think the result should be int but can be investigated later

Signed-off-by: José Lucas Silva Mayer <[email protected]>

…h-duplicated-index

mroeschke · 2023-10-09T18:30:42Z

Thanks @josemayer

mroeschke reviewed Oct 4, 2023

View reviewed changes

mroeschke added the Testing pandas testing functions or related to the test suite label Oct 4, 2023

josemayer added 2 commits October 5, 2023 22:50

remove shift and add cast dataframe dtype to float

5bf7173

Signed-off-by: José Lucas Silva Mayer <[email protected]>

Merge remote-tracking branch 'upstream/main' into new-test-fillna-wit…

eed85b4

…h-duplicated-index

mroeschke reviewed Oct 6, 2023

View reviewed changes

josemayer added 2 commits October 8, 2023 01:00

remove check of dataframe dtype on assert

28aa1e5

Signed-off-by: José Lucas Silva Mayer <[email protected]>

Merge remote-tracking branch 'upstream/main' into new-test-fillna-wit…

1ad72d0

…h-duplicated-index

mroeschke added this to the 2.2 milestone Oct 9, 2023

mroeschke approved these changes Oct 9, 2023

View reviewed changes

mroeschke merged commit 361b62e into pandas-dev:main Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: GroupBy.ffill on a multi-index dataframe #55408

TST: GroupBy.ffill on a multi-index dataframe #55408

josemayer commented Oct 4, 2023 •

edited

Loading

mroeschke Oct 4, 2023

josemayer Oct 6, 2023 •

edited

Loading

mroeschke Oct 6, 2023

mroeschke commented Oct 9, 2023

TST: GroupBy.ffill on a multi-index dataframe #55408

TST: GroupBy.ffill on a multi-index dataframe #55408

Conversation

josemayer commented Oct 4, 2023 • edited Loading

mroeschke Oct 4, 2023

Choose a reason for hiding this comment

josemayer Oct 6, 2023 • edited Loading

Choose a reason for hiding this comment

mroeschke Oct 6, 2023

Choose a reason for hiding this comment

mroeschke commented Oct 9, 2023

josemayer commented Oct 4, 2023 •

edited

Loading

josemayer Oct 6, 2023 •

edited

Loading