Skip to content

Inconsistent result with cumsum columns #32462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MathieuDutSik opened this issue Mar 5, 2020 · 2 comments
Closed

Inconsistent result with cumsum columns #32462

MathieuDutSik opened this issue Mar 5, 2020 · 2 comments
Labels
Bug Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@MathieuDutSik
Copy link

MathieuDutSik commented Mar 5, 2020

I have problem with cumsum and multiple columns

>>> df1 = pd.DataFrame({"A": [2, 1, np.nan, 1, 2, 2, 1],"B": [-8, 2, 3, 1, 5, 6, 7],"C": [3, 5, 6, 5, 4, 4, 3]})
>>> df1.groupby("A").cumsum()
    B   C
0  -8   3
1   2   5
2  -1   6
3   3  10
4  -3   7
5   3  11
6  10  13
>>> df1.groupby("A").cumsum()
    B   C
0  -8   3
1   2   5
2  -1  -1
3   3  10
4  -3   7
5   3  11
6  10  13

The cumsum is computed only on the first column B and the column C is left unchanged. Worse, when recomputing then I get the result I would expect. I could accept either behavior but inconsistent result when iterating seems wrong to me.

@jreback
Copy link
Contributor

jreback commented Mar 5, 2020

you are grouping in a man group which is ignored by default

@simonjayhawkins simonjayhawkins added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Apr 25, 2020
@mroeschke mroeschke added the Bug label Jul 29, 2021
@rhshadrach
Copy link
Member

Fixed by #46367 and tests added there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

5 participants