Skip to content

[WIP, ENH] Adds cumulative methods to ea #28509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Changes from 4 commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
2897723
Merge pull request #1 from pandas-dev/master
datajanko Sep 13, 2019
a54e1b4
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko Sep 14, 2019
10abc0f
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko Sep 15, 2019
c2d7592
define accumulation interface for ExtensionArrays
datajanko Sep 18, 2019
2c149c0
reformulate doc string
datajanko Sep 19, 2019
79cea11
creates baseExtension tests for accumulate
datajanko Sep 19, 2019
12a5ca3
adds fixtures for numeric_accumulations
datajanko Oct 4, 2019
dc959f4
fixes typos
datajanko Nov 13, 2019
bcfb8a8
adds accumulate tests for integer arrays
datajanko Dec 10, 2019
9a8f4ec
fixes typo
datajanko Dec 12, 2019
5d837d9
first implementation of cumsum
datajanko Jan 9, 2020
9e9f0c3
Merge pull request #2 from pandas-dev/master
datajanko Feb 12, 2020
a1a1cb2
Merge pull request #3 from pandas-dev/master
datajanko Mar 15, 2020
6d967ad
merges master
datajanko Mar 15, 2020
73363bf
stashed merge conflict
datajanko Mar 15, 2020
0d9a3d5
fixes formatting
datajanko Mar 15, 2020
84a7d81
first green test for integer extension arrays and cumsum
datajanko Mar 23, 2020
ce6869d
first passing tests for cummin and cummax
datajanko Apr 2, 2020
3b5d1d8
utilizes na_accum_func
datajanko Apr 5, 2020
0337cb0
removes delegation leftover
datajanko Apr 5, 2020
f0722f5
creates running tests
datajanko Apr 9, 2020
99baa1b
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko Apr 9, 2020
fa35b14
removes ABCExtensionArray Type hint
datajanko Apr 9, 2020
43fca7c
Merge pull request #4 from pandas-dev/master
datajanko Apr 10, 2020
7bd6378
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko Apr 10, 2020
185510b
removes clutter from generic.py
datajanko Apr 10, 2020
2ef9ebb
removes clutter in _accumulate
datajanko Apr 10, 2020
7d898bd
adds typehints for ExtensionArray and IntegerArray
datajanko Apr 10, 2020
09b42be
delegates the accumulate calls to extension arrays
datajanko Apr 10, 2020
af0dd24
removes diff in nanops
datajanko Apr 10, 2020
bc9a36a
removes unwanted pattern
datajanko Apr 10, 2020
38454a3
makes output types for sum and prod explicit
datajanko Apr 12, 2020
5ecfa51
makes the base accumulate test more general by not comparing types
datajanko Apr 13, 2020
8d62594
implements accumulation for boolean arrays
datajanko Apr 13, 2020
5f3b624
uses f-string in base.py
datajanko Apr 26, 2020
06d1286
uses blockmanager also for extension arrays
datajanko May 2, 2020
7efcb5f
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko May 3, 2020
ae5f969
merges master
datajanko May 3, 2020
f7e3f4f
fixes flake8 issues
datajanko May 3, 2020
aa99927
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko Jun 15, 2020
9cab6d9
merges master
datajanko Jun 15, 2020
b3ae864
removes uncommented code
datajanko Jun 17, 2020
52e6486
adds todo for runtime warning
datajanko Jun 17, 2020
99fb664
reuses integer array to accumulate for booleans
datajanko Jun 22, 2020
d339250
removes runtimewarning catching
datajanko Jun 22, 2020
be6f974
removes TODOs
datajanko Jun 23, 2020
a902f4e
adds accumulate to autosummary
datajanko Jun 23, 2020
64afb5b
excludes datetime from propagating to _accumulate
datajanko Jun 24, 2020
1e5d77b
uses pandas.testing instead of pandas.util.testing in accumulate
datajanko Jun 29, 2020
c95b490
replaces assert_almost_equal with assert_series_equal
datajanko Jun 30, 2020
dc669de
dtypes to lowercase
datajanko Jun 30, 2020
08475a4
lowercase of uint and int64 dtype in _accumulate
datajanko Jun 30, 2020
67fa99a
uses hint of @simonjayhawkins concerning assert series equals
datajanko Jul 21, 2020
a36632b
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko Jul 21, 2020
b3d3c81
adds whatsnew entry
datajanko Jul 25, 2020
f8f6367
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko Jul 25, 2020
ad6773d
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko Jul 25, 2020
663c301
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko Aug 10, 2020
e17f3a0
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko Aug 10, 2020
8cb66f9
moves changes to 1.2.0
datajanko Aug 10, 2020
4f953cf
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko Aug 11, 2020
b33a5df
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko Aug 11, 2020
18ec178
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko Sep 18, 2020
305bdc7
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko Sep 22, 2020
56bfb23
merges master
datajanko Sep 22, 2020
63db854
merges master
datajanko Oct 31, 2020
9c91c55
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko Oct 31, 2020
6ba3ca9
uses na_accum_func
datajanko Nov 5, 2020
f2a49b3
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko Jan 11, 2021
386fa39
merges master
datajanko Jan 12, 2021
55de384
delegate to EAs _accumulate function in block mgr
datajanko Jan 16, 2021
6a5b7f8
moves implementation from nanops to masked_accumulations
datajanko Jan 19, 2021
9c63c64
fixes typing annotations in base and masked
datajanko Jan 21, 2021
84dd141
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko Jan 22, 2021
2f23499
fixes merge error
datajanko Jan 22, 2021
a5b30e6
fills na values without nanops
datajanko Jan 22, 2021
d22c8a0
fixes incorrect call to cumsum and changes to cumprod
datajanko Jan 25, 2021
a5866c7
add _accumulate to boolean
datajanko Jan 25, 2021
8255457
makes tests a lot easier - cumprod tests still fail
datajanko Jan 25, 2021
483b608
adds BaseNumericAccumulation for floating masked array
datajanko Jan 26, 2021
150fd3b
tests no numeric accumulations according to _accumulate interface
datajanko Jan 26, 2021
80e2dc6
uses NotImplementedError in base accumulate function
datajanko Jan 28, 2021
dceab99
ensures the fill values are data independent
datajanko Feb 16, 2021
1c14f18
adds accumulation for datetimelikes
datajanko Feb 16, 2021
e20501a
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko Feb 16, 2021
53147c4
fixes merge conflicts
datajanko Feb 16, 2021
597e978
actually ads datetimelike accumulation algos
datajanko Feb 16, 2021
5ebe8ea
fixes absolute imports
datajanko Feb 16, 2021
32367c0
changes error to catch to adhere to changed implementation
datajanko Feb 20, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 33 additions & 1 deletion pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ class ExtensionArray:
take
unique
view
_accumulate
_concat_same_type
_formatter
_from_factorized
Expand Down Expand Up @@ -114,8 +115,9 @@ class ExtensionArray:
as they only compose abstract methods. Still, a more efficient
implementation may be available, and these methods can be overridden.

One can implement methods to handle array reductions.
One can implement methods to handle array accumulations or reductions.

* _accumulate
* _reduce

One can implement methods to handle parsing from strings that will be used
Expand Down Expand Up @@ -407,6 +409,7 @@ def isna(self) -> ArrayLike:

* ``na_values._is_boolean`` should be True
* `na_values` should implement :func:`ExtensionArray._reduce`
* `na_values` should implement :func:`ExtensionArray._accumulate`
* ``na_values.any`` and ``na_values.all`` should be implemented
"""
raise AbstractMethodError(self)
Expand Down Expand Up @@ -992,6 +995,35 @@ def _ndarray_values(self) -> np.ndarray:
"""
return np.array(self)

def _accumulate(self, name, skipna=True, **kwargs):
"""
Return an array result of performing the accumulation operation.

Parameters
----------
name : str
Name of the function, supported values are:
{ cummin, cummax, cumsum, cumprod }.
skipna : bool, default True
If True, skip NaN values.
**kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be for numpy compatibility (axis, dtype, out).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your expectations adding this? Do you purely want to have it (for this ticket) as accessor, which we'll ignore? If not, I'd guess the impact of axis is None, but dtype might be interesting. I.e for cumsum and integer dtypes, we could also provide the target output type, so not defaulting to (U)Int64. But I don't know if this should be part of this issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea would be for something like np.cumsum(pd.array([1, 2])) to return an IntegerArray. I'm not sure what all is required for that to work.

Additional keyword arguments passed to the accumulation function.
Currently, no is the only supported kwarg.

Returns
-------
array

Raises
------
TypeError : subclass does not define accumulations
"""
raise TypeError(
"cannot perform {name} with type {dtype}".format(
name=name, dtype=self.dtype
)
)

def _reduce(self, name, skipna=True, **kwargs):
"""
Return a scalar result of performing the reduction operation.
Expand Down