-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Support mask in groupby cumprod #48138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
5452698
af0c539
cd4396d
3bfe8c3
e931b93
c82ed6b
781c678
2476651
36a2edc
39d5858
fdfbf22
c6fc53c
459d225
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -641,10 +641,10 @@ def test_groupby_cumprod(): | |
tm.assert_series_equal(actual, expected) | ||
|
||
df = DataFrame({"key": ["b"] * 100, "value": 2}) | ||
df["value"] = df["value"].astype(float) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can maybe keep this with as int (or test both in addition), so we have a test for the silent overflow behaviour? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a new test explicitly testing that overflow is consistent with numpy |
||
actual = df.groupby("key")["value"].cumprod() | ||
# if overflows, groupby product casts to float | ||
# while numpy passes back invalid values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comment can probably be updated There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
df["value"] = df["value"].astype(float) | ||
expected = df.groupby("key", group_keys=False)["value"].apply(lambda x: x.cumprod()) | ||
expected.name = "value" | ||
tm.assert_series_equal(actual, expected) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also now uses int64 instead of float64 for the numpy dtypes? So that also changes behaviour in those cases regarding overflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, should we mention this in the whatsnew?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, yes. Maybe as notable bug fix, as it has some behaviour change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.