-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Implement IntegerArray.sum #33538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Implement IntegerArray.sum #33538
Conversation
pandas/core/arrays/integer.py
Outdated
@@ -573,6 +573,12 @@ def _reduce(self, name: str, skipna: bool = True, **kwargs): | |||
|
|||
return result | |||
|
|||
def sum(self, skipna: bool = True, min_count: int = 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably should make the signature match PandasArray.sum etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, and we still want to keep axis (even though it wouldn't be used here)?
@@ -113,6 +113,17 @@ def test_value_counts_empty(): | |||
tm.assert_series_equal(result, expected) | |||
|
|||
|
|||
@pytest.mark.parametrize("skipna", [True, False]) | |||
@pytest.mark.parametrize("min_count", [0, 4]) | |||
def test_integer_array_sum(skipna, min_count): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is e.g. Series[Int64].sum() or DataFrame[Int64].sum() fixed by this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are actually already working, just not for IntegerArray specifically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the correct way to fix this. series/frames already dispatch to EAs via _reduce, which DO implement sum/prod/min/max. The issue is these functions need to be exposed via that dispatch mechanism, rather than writing more code to actually do the reduction.
furthermore the numpy_ EA do this in an opposite way, meaning that the ops themselves are defined, and _reduce dispatches TO them. So we should decide on a single way forward here. I think the way integer array does this is better, e.g. in _reduce. |
As mentioned in #33351 (comment), we should probably have a general discussion about how to deal with those reduction methods in our EAs, otherwise those comments like Jeff's concern above (#33538) are going to keep coming up in each PR that adds something like this. |
pandas/core/arrays/integer.py
Outdated
dtype=None, | ||
out=None, | ||
keepdims=False, | ||
initial=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove all those unnecessary keywords and put them in a **kwargs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. @jorisvandenbossche agree we agree on the issues you mentioned, but happy with this for now.
thanks @dsaxton ! |
@dsaxton would you like to open a general issue about this? (#33538 (comment)) |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
I think this is mostly interesting in that it allows normalize=True for value_counts on an IntegerArray backed Series, which currently doesn't work: