ENH: Implement IntegerArray.sum #33538

dsaxton · 2020-04-14T02:46:14Z

I think this is mostly interesting in that it allows normalize=True for value_counts on an IntegerArray backed Series, which currently doesn't work:

[ins] In [1]: s = pd.Series([1, 2, 3], dtype="Int64")

[ins] In [2]: s.value_counts(normalize=True)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-2bf1a78353e5> in <module>
----> 1 s.value_counts(normalize=True)

~/pandas/pandas/core/base.py in value_counts(self, normalize, sort, ascending, bins, dropna)
   1252             normalize=normalize,
   1253             bins=bins,
-> 1254             dropna=dropna,
   1255         )
   1256         return result

~/pandas/pandas/core/algorithms.py in value_counts(values, sort, ascending, normalize, bins, dropna)
    725
    726     if normalize:
--> 727         result = result / float(counts.sum())
    728
    729     return result

AttributeError: 'IntegerArray' object has no attribute 'sum'

jbrockmendel · 2020-04-14T04:23:47Z

pandas/core/arrays/integer.py

@@ -573,6 +573,12 @@ def _reduce(self, name: str, skipna: bool = True, **kwargs):

        return result

+    def sum(self, skipna: bool = True, min_count: int = 0):


probably should make the signature match PandasArray.sum etc

Ok, and we still want to keep axis (even though it wouldn't be used here)?

jbrockmendel · 2020-04-14T04:24:20Z

pandas/tests/arrays/integer/test_function.py

@@ -113,6 +113,17 @@ def test_value_counts_empty():
    tm.assert_series_equal(result, expected)


+@pytest.mark.parametrize("skipna", [True, False])
+@pytest.mark.parametrize("min_count", [0, 4])
+def test_integer_array_sum(skipna, min_count):


is e.g. Series[Int64].sum() or DataFrame[Int64].sum() fixed by this?

Those are actually already working, just not for IntegerArray specifically

jreback

this is not the correct way to fix this. series/frames already dispatch to EAs via _reduce, which DO implement sum/prod/min/max. The issue is these functions need to be exposed via that dispatch mechanism, rather than writing more code to actually do the reduction.

jreback · 2020-04-15T01:41:24Z

furthermore the numpy_ EA do this in an opposite way, meaning that the ops themselves are defined, and _reduce dispatches TO them.

So we should decide on a single way forward here. I think the way integer array does this is better, e.g. in _reduce.

doc/source/whatsnew/v1.1.0.rst

jorisvandenbossche · 2020-04-24T11:12:41Z

As mentioned in #33351 (comment), we should probably have a general discussion about how to deal with those reduction methods in our EAs, otherwise those comments like Jeff's concern above (#33538) are going to keep coming up in each PR that adds something like this.

jorisvandenbossche · 2020-04-24T11:13:11Z

pandas/core/arrays/integer.py

+        dtype=None,
+        out=None,
+        keepdims=False,
+        initial=None,


Can you remove all those unnecessary keywords and put them in a **kwargs ?

jreback

lgtm. @jorisvandenbossche agree we agree on the issues you mentioned, but happy with this for now.

jorisvandenbossche · 2020-04-25T08:05:26Z

thanks @dsaxton !

jorisvandenbossche · 2020-04-25T08:06:16Z

@dsaxton would you like to open a general issue about this? (#33538 (comment))

ENH: Implement IntegerArray.sum

f8d8d25

jbrockmendel reviewed Apr 14, 2020

View reviewed changes

dsaxton added 3 commits April 14, 2020 15:54

Update signature and tests

85d66ab

Merge remote-tracking branch 'upstream/master' into nullable-int-sum

a44ae16

PR num

aa7ee1a

jreback requested changes Apr 15, 2020

View reviewed changes

jreback added ExtensionArray Extending pandas with custom dtypes or arrays. Numeric Operations Arithmetic, Comparison, and Logical operations labels Apr 15, 2020

dsaxton mentioned this pull request Apr 23, 2020

BUG: value_counts not working correctly on ExtensionArrays #33674

Merged

5 tasks

simonjayhawkins reviewed Apr 23, 2020

View reviewed changes

doc/source/whatsnew/v1.1.0.rst Outdated Show resolved Hide resolved

dsaxton added 2 commits April 23, 2020 14:44

Release note

5fdb586

Merge remote-tracking branch 'upstream/master' into nullable-int-sum

1d8eb4d

jorisvandenbossche reviewed Apr 24, 2020

View reviewed changes

dsaxton added 2 commits April 24, 2020 15:43

numpy compat

32a6b4e

Merge remote-tracking branch 'upstream/master' into nullable-int-sum

9ee8ab2

jreback added this to the 1.1 milestone Apr 24, 2020

jreback approved these changes Apr 24, 2020

View reviewed changes

jorisvandenbossche approved these changes Apr 25, 2020

View reviewed changes

jorisvandenbossche merged commit 77a0f19 into pandas-dev:master Apr 25, 2020

simonjayhawkins mentioned this pull request Apr 25, 2020

value_counts not working correctly on (some?) ExtensionArrays #33172

Closed

dsaxton deleted the nullable-int-sum branch April 25, 2020 13:38

dsaxton mentioned this pull request Apr 25, 2020

API: Dispatch mechanism for EA reductions #33790

Closed

rhshadrach pushed a commit to rhshadrach/pandas that referenced this pull request May 10, 2020

ENH: Implement IntegerArray.sum (pandas-dev#33538)

2860955

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Implement IntegerArray.sum #33538

ENH: Implement IntegerArray.sum #33538

dsaxton commented Apr 14, 2020 •

edited

Loading

jbrockmendel Apr 14, 2020

dsaxton Apr 14, 2020

jbrockmendel Apr 14, 2020

dsaxton Apr 14, 2020

jreback left a comment

jreback commented Apr 15, 2020

jorisvandenbossche commented Apr 24, 2020

jorisvandenbossche Apr 24, 2020

jreback left a comment

jorisvandenbossche commented Apr 25, 2020

jorisvandenbossche commented Apr 25, 2020

		@@ -573,6 +573,12 @@ def _reduce(self, name: str, skipna: bool = True, **kwargs):

		return result

		def sum(self, skipna: bool = True, min_count: int = 0):

ENH: Implement IntegerArray.sum #33538

ENH: Implement IntegerArray.sum #33538

Conversation

dsaxton commented Apr 14, 2020 • edited Loading

jbrockmendel Apr 14, 2020

Choose a reason for hiding this comment

dsaxton Apr 14, 2020

Choose a reason for hiding this comment

jbrockmendel Apr 14, 2020

Choose a reason for hiding this comment

dsaxton Apr 14, 2020

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jreback commented Apr 15, 2020

jorisvandenbossche commented Apr 24, 2020

jorisvandenbossche Apr 24, 2020

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Apr 25, 2020

jorisvandenbossche commented Apr 25, 2020

dsaxton commented Apr 14, 2020 •

edited

Loading