-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Support mask in groupby cumprod #48138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
5452698
af0c539
cd4396d
3bfe8c3
e931b93
c82ed6b
781c678
2476651
36a2edc
39d5858
fdfbf22
c6fc53c
459d225
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -643,13 +643,28 @@ def test_groupby_cumprod(): | |
df = DataFrame({"key": ["b"] * 100, "value": 2}) | ||
df["value"] = df["value"].astype(float) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can maybe keep this with as int (or test both in addition), so we have a test for the silent overflow behaviour? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a new test explicitly testing that overflow is consistent with numpy |
||
actual = df.groupby("key")["value"].cumprod() | ||
# if overflows, groupby product casts to float | ||
# while numpy passes back invalid values | ||
expected = df.groupby("key", group_keys=False)["value"].apply(lambda x: x.cumprod()) | ||
expected.name = "value" | ||
tm.assert_series_equal(actual, expected) | ||
|
||
|
||
def test_groupby_cumprod_overflow(): | ||
# GH#37493 if we overflow we return garbage consistent with numpy | ||
df = DataFrame({"key": ["b"] * 4, "value": 100_000}) | ||
actual = df.groupby("key")["value"].cumprod() | ||
expected = Series( | ||
[100_000, 10_000_000_000, 1_000_000_000_000_000, 7766279631452241920], | ||
name="value", | ||
) | ||
tm.assert_series_equal(actual, expected) | ||
|
||
numpy_result = df.groupby("key", group_keys=False)["value"].apply( | ||
lambda x: x.cumprod() | ||
) | ||
numpy_result.name = "value" | ||
tm.assert_series_equal(actual, numpy_result) | ||
|
||
|
||
def test_groupby_cumprod_nan_influences_other_columns(): | ||
# GH#48064 | ||
df = DataFrame( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe mention that it is making it consistent with the DataFrame method as well? (without groupby)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a reference to the methods