Skip to content

TST: GroupBy.ffill on a multi-index dataframe #55408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

josemayer
Copy link
Contributor

@josemayer josemayer commented Oct 4, 2023

For additional information, we have explained how this new automated test work on first commit description.

This test checks whether the ffill method works correctly on a
multi-index dataframe. In it, we define a dataframe with indexes
[0, 1, 2, 0, 1, 2] and column "a" with values [1, 2, NaN, 3, 4, 5].
We group this dataframe by columns (level = 0) and shift it by one,
so we have:

0: [1, 3] -> [NaN, 1]
1: [2, 4] -> [NaN, 2]
2: [NaN, 5] -> [NaN, NaN]

Then, since index order remain the same, if we apply ffill method, it
should give us a dataframe with column "a" equal to [NaN, NaN, NaN, 1,
2, 2].

Co-authored-by: José Lucas Silva Mayer <[email protected]>
Co-authored-by: Willian Wang <[email protected]>
# GH#43412
df = DataFrame({"a": [1, 2, np.nan, 3, 4, 5]}, index=[0, 1, 2, 0, 1, 2])

result = df.groupby(level=0).shift().ffill()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point of the original issue was to test groupby.ffill

Copy link
Contributor Author

@josemayer josemayer Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

I'm testing this and when I use groupby(level=0) and ffill() methods of dataframe, it gives me an error of different types:

def test_groupby_ffill_with_duplicated_index():
  # GH#43412
  df = DataFrame({"a": [1, 2, 3, 4, np.nan, np.nan]}, index=[0, 1, 2, 0, 1, 2])
  
  result = df.groupby(level=0).ffill()
  expected = DataFrame(
    {"a": [1, 2, 3, 4, 2, 3]}, index=[0, 1, 2, 0, 1, 2]
  )
>       tm.assert_frame_equal(result, expected)
E       AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
E       
E       Attribute "dtype" are different
E       [left]:  float64
E       [right]: int64

Is this behaviour expected?

Anyways, I will write the test with a float64 type dataframe.

@mroeschke mroeschke added the Testing pandas testing functions or related to the test suite label Oct 4, 2023
expected = DataFrame(
{"a": [1, 2, 3, 4, 2, 3]}, index=[0, 1, 2, 0, 1, 2], dtype=float
)
tm.assert_frame_equal(result, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can specify check_dtype=False here instead of needing to specify float above. i think the result should be int but can be investigated later

@mroeschke mroeschke added this to the 2.2 milestone Oct 9, 2023
@mroeschke mroeschke merged commit 361b62e into pandas-dev:main Oct 9, 2023
@mroeschke
Copy link
Member

Thanks @josemayer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: a duplicated index would cause groupby.fillna(method='ffill') a wrong result
2 participants