Skip to content

ENH: Add cumulative methods to ea #48111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 129 commits into from
Dec 13, 2022
Merged
Show file tree
Hide file tree
Changes from 120 commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
2897723
Merge pull request #1 from pandas-dev/master
datajanko Sep 13, 2019
a54e1b4
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko Sep 14, 2019
10abc0f
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko Sep 15, 2019
c2d7592
define accumulation interface for ExtensionArrays
datajanko Sep 18, 2019
2c149c0
reformulate doc string
datajanko Sep 19, 2019
79cea11
creates baseExtension tests for accumulate
datajanko Sep 19, 2019
12a5ca3
adds fixtures for numeric_accumulations
datajanko Oct 4, 2019
dc959f4
fixes typos
datajanko Nov 13, 2019
bcfb8a8
adds accumulate tests for integer arrays
datajanko Dec 10, 2019
9a8f4ec
fixes typo
datajanko Dec 12, 2019
5d837d9
first implementation of cumsum
datajanko Jan 9, 2020
9e9f0c3
Merge pull request #2 from pandas-dev/master
datajanko Feb 12, 2020
a1a1cb2
Merge pull request #3 from pandas-dev/master
datajanko Mar 15, 2020
6d967ad
merges master
datajanko Mar 15, 2020
73363bf
stashed merge conflict
datajanko Mar 15, 2020
0d9a3d5
fixes formatting
datajanko Mar 15, 2020
84a7d81
first green test for integer extension arrays and cumsum
datajanko Mar 23, 2020
ce6869d
first passing tests for cummin and cummax
datajanko Apr 2, 2020
3b5d1d8
utilizes na_accum_func
datajanko Apr 5, 2020
0337cb0
removes delegation leftover
datajanko Apr 5, 2020
f0722f5
creates running tests
datajanko Apr 9, 2020
99baa1b
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko Apr 9, 2020
fa35b14
removes ABCExtensionArray Type hint
datajanko Apr 9, 2020
43fca7c
Merge pull request #4 from pandas-dev/master
datajanko Apr 10, 2020
7bd6378
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko Apr 10, 2020
185510b
removes clutter from generic.py
datajanko Apr 10, 2020
2ef9ebb
removes clutter in _accumulate
datajanko Apr 10, 2020
7d898bd
adds typehints for ExtensionArray and IntegerArray
datajanko Apr 10, 2020
09b42be
delegates the accumulate calls to extension arrays
datajanko Apr 10, 2020
af0dd24
removes diff in nanops
datajanko Apr 10, 2020
bc9a36a
removes unwanted pattern
datajanko Apr 10, 2020
38454a3
makes output types for sum and prod explicit
datajanko Apr 12, 2020
5ecfa51
makes the base accumulate test more general by not comparing types
datajanko Apr 13, 2020
8d62594
implements accumulation for boolean arrays
datajanko Apr 13, 2020
5f3b624
uses f-string in base.py
datajanko Apr 26, 2020
06d1286
uses blockmanager also for extension arrays
datajanko May 2, 2020
7efcb5f
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko May 3, 2020
ae5f969
merges master
datajanko May 3, 2020
f7e3f4f
fixes flake8 issues
datajanko May 3, 2020
aa99927
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko Jun 15, 2020
9cab6d9
merges master
datajanko Jun 15, 2020
b3ae864
removes uncommented code
datajanko Jun 17, 2020
52e6486
adds todo for runtime warning
datajanko Jun 17, 2020
99fb664
reuses integer array to accumulate for booleans
datajanko Jun 22, 2020
d339250
removes runtimewarning catching
datajanko Jun 22, 2020
be6f974
removes TODOs
datajanko Jun 23, 2020
a902f4e
adds accumulate to autosummary
datajanko Jun 23, 2020
64afb5b
excludes datetime from propagating to _accumulate
datajanko Jun 24, 2020
1e5d77b
uses pandas.testing instead of pandas.util.testing in accumulate
datajanko Jun 29, 2020
c95b490
replaces assert_almost_equal with assert_series_equal
datajanko Jun 30, 2020
dc669de
dtypes to lowercase
datajanko Jun 30, 2020
08475a4
lowercase of uint and int64 dtype in _accumulate
datajanko Jun 30, 2020
67fa99a
uses hint of @simonjayhawkins concerning assert series equals
datajanko Jul 21, 2020
a36632b
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko Jul 21, 2020
b3d3c81
adds whatsnew entry
datajanko Jul 25, 2020
f8f6367
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko Jul 25, 2020
ad6773d
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko Jul 25, 2020
663c301
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko Aug 10, 2020
e17f3a0
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko Aug 10, 2020
8cb66f9
moves changes to 1.2.0
datajanko Aug 10, 2020
4f953cf
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko Aug 11, 2020
b33a5df
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko Aug 11, 2020
18ec178
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko Sep 18, 2020
305bdc7
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko Sep 22, 2020
56bfb23
merges master
datajanko Sep 22, 2020
63db854
merges master
datajanko Oct 31, 2020
9c91c55
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko Oct 31, 2020
6ba3ca9
uses na_accum_func
datajanko Nov 5, 2020
f2a49b3
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko Jan 11, 2021
386fa39
merges master
datajanko Jan 12, 2021
55de384
delegate to EAs _accumulate function in block mgr
datajanko Jan 16, 2021
6a5b7f8
moves implementation from nanops to masked_accumulations
datajanko Jan 19, 2021
9c63c64
fixes typing annotations in base and masked
datajanko Jan 21, 2021
84dd141
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko Jan 22, 2021
2f23499
fixes merge error
datajanko Jan 22, 2021
a5b30e6
fills na values without nanops
datajanko Jan 22, 2021
d22c8a0
fixes incorrect call to cumsum and changes to cumprod
datajanko Jan 25, 2021
a5866c7
add _accumulate to boolean
datajanko Jan 25, 2021
8255457
makes tests a lot easier - cumprod tests still fail
datajanko Jan 25, 2021
483b608
adds BaseNumericAccumulation for floating masked array
datajanko Jan 26, 2021
150fd3b
tests no numeric accumulations according to _accumulate interface
datajanko Jan 26, 2021
80e2dc6
uses NotImplementedError in base accumulate function
datajanko Jan 28, 2021
dceab99
ensures the fill values are data independent
datajanko Feb 16, 2021
1c14f18
adds accumulation for datetimelikes
datajanko Feb 16, 2021
e20501a
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko Feb 16, 2021
53147c4
fixes merge conflicts
datajanko Feb 16, 2021
597e978
actually ads datetimelike accumulation algos
datajanko Feb 16, 2021
5ebe8ea
fixes absolute imports
datajanko Feb 16, 2021
32367c0
changes error to catch to adhere to changed implementation
datajanko Feb 20, 2021
1fc7a39
Merge remote-tracking branch 'upstream/main' into 28385-add-cumulativ…
phofl Aug 16, 2022
628611c
Remove blank line in old whatsnew
phofl Aug 16, 2022
d884845
Remove merge error
phofl Aug 16, 2022
054ad94
Fix additional merge errors
phofl Aug 16, 2022
64219d9
Refactor datetimelike accum funcs
phofl Aug 16, 2022
a8645db
Remove unnecessary import
phofl Aug 16, 2022
597dd84
Refactor tests
phofl Aug 16, 2022
13b2633
Skip test
phofl Aug 16, 2022
2acc7a8
Fix mypy
phofl Aug 17, 2022
a410a88
Fix dtype creation
phofl Aug 17, 2022
a066588
Fix cumprod tests
phofl Aug 17, 2022
580267b
Fix docstring
phofl Aug 18, 2022
54aa8a8
Adress review
phofl Aug 18, 2022
fecdc42
Merge remote-tracking branch 'upstream/main' into 28385-add-cumulativ…
phofl Aug 20, 2022
a8eb63a
Merge remote-tracking branch 'upstream/main' into 28385-add-cumulativ…
phofl Aug 30, 2022
19f36cf
Merge remote-tracking branch 'upstream/main' into 28385-add-cumulativ…
phofl Sep 8, 2022
67c6327
Merge remote-tracking branch 'upstream/main' into 28385-add-cumulativ…
phofl Sep 18, 2022
fb4fdc2
Merge remote-tracking branch 'upstream/main' into 28385-add-cumulativ…
phofl Nov 16, 2022
9c81a1d
Adress review
phofl Nov 16, 2022
6e2b453
Update pandas/core/arrays/base.py
phofl Nov 22, 2022
cdca590
Update pandas/tests/extension/test_integer.py
phofl Nov 22, 2022
5c9a940
Merge remote-tracking branch 'upstream/main' into 28385-add-cumulativ…
phofl Nov 22, 2022
6765fe1
Add comment
phofl Nov 22, 2022
d3be9f3
Clarify comment
phofl Nov 23, 2022
b4eb0fd
Fix pre commit
phofl Nov 23, 2022
611b85e
Add whatsnew
phofl Nov 23, 2022
a6a974a
Move to top of file
phofl Nov 29, 2022
e7364bd
Change error
phofl Nov 29, 2022
4ff6e4d
Change _data
phofl Nov 29, 2022
57abcc3
Remove
phofl Nov 29, 2022
1b3771e
Merge remote-tracking branch 'upstream/main' into 28385-add-cumulativ…
phofl Nov 29, 2022
1267961
Add todo
phofl Nov 30, 2022
c770872
Fix typo
phofl Nov 30, 2022
cb7277b
Adjust var
phofl Nov 30, 2022
797e724
Special case
phofl Nov 30, 2022
4f8b06a
Merge remote-tracking branch 'upstream/main' into 28385-add-cumulativ…
phofl Dec 10, 2022
ab3cf7e
Fix tests
phofl Dec 10, 2022
e1d2a4e
Combine classes
phofl Dec 12, 2022
53eac54
Merge branch 'main' into 28385-add-cumulative-methods-to-EA
phofl Dec 12, 2022
e7dbd5f
Fix mypy
phofl Dec 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/reference/extensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ objects.
.. autosummary::
:toctree: api/

api.extensions.ExtensionArray._accumulate
api.extensions.ExtensionArray._concat_same_type
api.extensions.ExtensionArray._formatter
api.extensions.ExtensionArray._from_factorized
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ Other enhancements
- Added new argument ``use_nullable_dtypes`` to :func:`read_csv` and :func:`read_excel` to enable automatic conversion to nullable dtypes (:issue:`36712`)
- Added ``index`` parameter to :meth:`DataFrame.to_dict` (:issue:`46398`)
- Added metadata propagation for binary operators on :class:`DataFrame` (:issue:`28283`)
- Added ``cumsum``, ``cumprod``, ``cummin`` and ``cummax`` to the ``ExtensionArray`` interface via ``_accumulate`` (:issue:`28385`)
- :class:`.CategoricalConversionWarning`, :class:`.InvalidComparison`, :class:`.InvalidVersion`, :class:`.LossySetitemError`, and :class:`.NoBufferPresent` are now exposed in ``pandas.errors`` (:issue:`27656`)
- Fix ``test`` optional_extra by adding missing test package ``pytest-asyncio`` (:issue:`48361`)
- :func:`DataFrame.astype` exception message thrown improved to include column name when type conversion is not possible. (:issue:`47571`)
Expand Down
11 changes: 11 additions & 0 deletions pandas/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -1123,6 +1123,17 @@ def all_logical_operators(request):
return request.param


_all_numeric_accumulations = ["cumsum", "cumprod", "cummin", "cummax"]


@pytest.fixture(params=_all_numeric_accumulations)
def all_numeric_accumulations(request):
"""
Fixture for numeric accumulation names
"""
return request.param


# ----------------------------------------------------------------
# Data sets/files
# ----------------------------------------------------------------
Expand Down
92 changes: 92 additions & 0 deletions pandas/core/array_algos/masked_accumulations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
"""
masked_accumulations.py is for accumulation algorithms using a mask-based approach
for missing values.
"""

from __future__ import annotations

from typing import Callable

import numpy as np

from pandas._typing import npt

from pandas.core.dtypes.common import (
is_bool_dtype,
is_float_dtype,
is_integer_dtype,
)


def _cum_func(
func: Callable,
values: np.ndarray,
mask: npt.NDArray[np.bool_],
*,
skipna: bool = True,
):
"""
Accumulations for 1D masked array.

We will modify values in place to replace NAs with the appropriate fill value.

Parameters
----------
func : np.cumsum, np.cumprod, np.maximum.accumulate, np.minimum.accumulate
values : np.ndarray
Numpy array with the values (can be of any dtype that support the
operation).
mask : np.ndarray
Boolean numpy array (True values indicate missing values).
skipna : bool, default True
Whether to skip NA.
"""
dtype_info: np.iinfo | np.finfo
if is_float_dtype(values):
dtype_info = np.finfo(values.dtype.type)
elif is_integer_dtype(values):
dtype_info = np.iinfo(values.dtype.type)
elif is_bool_dtype(values):
# Max value of bool is 1, but since we are setting into a boolean
# array, 255 is fine as well. Min value has to be 0 when setting
# into the boolean array.
dtype_info = np.iinfo(np.uint8)
else:
raise NotImplementedError(
f"No masked accumulation defined for dtype {values.dtype.type}"
)
try:
fill_value = {
np.cumprod: 1,
np.maximum.accumulate: dtype_info.min,
np.cumsum: 0,
np.minimum.accumulate: dtype_info.max,
}[func]
except KeyError:
raise NotImplementedError(
f"No accumulation for {func} implemented on BaseMaskedArray"
)

values[mask] = fill_value

if not skipna:
mask = np.maximum.accumulate(mask)

values = func(values)
return values, mask


def cumsum(values: np.ndarray, mask: npt.NDArray[np.bool_], *, skipna: bool = True):
return _cum_func(np.cumsum, values, mask, skipna=skipna)


def cumprod(values: np.ndarray, mask: npt.NDArray[np.bool_], *, skipna: bool = True):
return _cum_func(np.cumprod, values, mask, skipna=skipna)


def cummin(values: np.ndarray, mask: npt.NDArray[np.bool_], *, skipna: bool = True):
return _cum_func(np.minimum.accumulate, values, mask, skipna=skipna)


def cummax(values: np.ndarray, mask: npt.NDArray[np.bool_], *, skipna: bool = True):
return _cum_func(np.maximum.accumulate, values, mask, skipna=skipna)
36 changes: 35 additions & 1 deletion pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ class ExtensionArray:
tolist
unique
view
_accumulate
_concat_same_type
_formatter
_from_factorized
Expand Down Expand Up @@ -182,8 +183,9 @@ class ExtensionArray:
as they only compose abstract methods. Still, a more efficient
implementation may be available, and these methods can be overridden.

One can implement methods to handle array reductions.
One can implement methods to handle array accumulations or reductions.

* _accumulate
* _reduce

One can implement methods to handle parsing from strings that will be used
Expand Down Expand Up @@ -1368,6 +1370,38 @@ def _concat_same_type(
def _can_hold_na(self) -> bool:
return self.dtype._can_hold_na

def _accumulate(
self, name: str, *, skipna: bool = True, **kwargs
) -> ExtensionArray:
"""
Return an ExtensionArray performing an accumulation operation.

The underlying data type might change.

Parameters
----------
name : str
Name of the function, supported values are:
- cummin
- cummax
- cumsum
- cumprod
skipna : bool, default True
If True, skip NA values.
**kwargs
Additional keyword arguments passed to the accumulation function.
Currently, there is no supported kwarg.

Returns
-------
array

Raises
------
NotImplementedError : subclass does not define accumulations
"""
raise NotImplementedError(f"cannot perform {name} with type {self.dtype}")

def _reduce(self, name: str, *, skipna: bool = True, **kwargs):
"""
Return a scalar result of performing the reduction operation.
Expand Down
17 changes: 17 additions & 0 deletions pandas/core/arrays/boolean.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
from pandas.core.dtypes.missing import isna

from pandas.core import ops
from pandas.core.array_algos import masked_accumulations
from pandas.core.arrays.masked import (
BaseMaskedArray,
BaseMaskedDtype,
Expand Down Expand Up @@ -378,3 +379,19 @@ def _logical_method(self, other, op):

# i.e. BooleanArray
return self._maybe_mask_result(result, mask)

def _accumulate(
self, name: str, *, skipna: bool = True, **kwargs
) -> BaseMaskedArray:
data = self._data
mask = self._mask
if name in ("cummin", "cummax"):
op = getattr(masked_accumulations, name)
data, mask = op(data, mask, skipna=skipna, **kwargs)
return type(self)(data, mask, copy=False)
else:
from pandas.core.arrays import IntegerArray

return IntegerArray(data.astype(int), mask)._accumulate(
name, skipna=skipna, **kwargs
)
16 changes: 16 additions & 0 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -1359,6 +1359,22 @@ def _addsub_object_array(self, other: np.ndarray, op):
result = result.reshape(self.shape)
return result

def _accumulate(self, name: str, *, skipna: bool = True, **kwargs):

data = self._ndarray.copy()

if name in {"cummin", "cummax"}:
func = np.minimum.accumulate if name == "cummin" else np.maximum.accumulate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the numpy functions not work directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a different behavior than going through nanops. The initial pr included this but I ripped it out to keep it a bit more focused. Plan to tackle this afterwards, but we have to decide what we actually want here first. Hence only doing masked ops here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a different behavior than going through nanops

can you expand on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipna is the problem, difference in behavior with regards to floats, remembered this incorrect.

See #28509 (comment)

Wanted to investigate this as a follow up when we can check this more isolated

result = cast(np.ndarray, nanops.na_accum_func(data, func, skipna=skipna))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it might choke on PeriodDtype?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work, could you elaborate what you suspect?

ser = Series([pd.Period('2012-1-1', freq='D'), pd.Period('2013-1-1', freq='D')])
ser.cummin()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this goes wrong when there's a NaT present

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would restore the previous behavior (but the previous behavior was wrong as well...). Are you ok with addressing this in a follow up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would restore the previous behavior

previous as of when? IIRC this was last changed multiple years ago.

but the previous behavior was wrong as well

so does this PR get the behavior right for dt64/td64? If so, the solution for PeriodDtype is similar to median/min/max to do a view to dt64, do the op, then view back.

Copy link
Member Author

@phofl phofl Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my explanation was a bit confusing.

The initial pr which this work is based on tried the following:

  • add accumulate to the ea interface
  • implement masked-based accumulators
  • implement new accumulation logic for date time and related arrays

This got confusing, since it was pretty big and the new date time logic changed the behavior. So I made the decision to reduce scope and only tackle the first 2 points and defer the third to a follow up.

To achieve this, I just send the date time logic through nanops again (this is what I meant with previous behavior, previous as in before _accumulate was added to the ea interface). This avoid any change in behavior for the date time was and should keep review more focused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try this one more time, thanks for your patience in explaining this to me.

IIUC this PR should not change any behavior for dt64/dt64tz/td64 dtypes, correct?

Copy link
Member Author

@phofl phofl Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. Want to do those changes as follow ups

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, ill trust you to handle these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, yep want to get this into 2.0 as well


# error: Unexpected keyword argument "freq" for
# "_simple_new" of "NDArrayBacked" [call-arg]
return type(self)._simple_new(
result, freq=self.freq, dtype=self.dtype # type: ignore[call-arg]
)

raise TypeError(f"Accumlation {name} not supported for {type(self)}")

@unpack_zerodim_and_defer("__add__")
def __add__(self, other):
other_dtype = getattr(other, "dtype", None)
Expand Down
16 changes: 15 additions & 1 deletion pandas/core/arrays/masked.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,10 @@
isin,
take,
)
from pandas.core.array_algos import masked_reductions
from pandas.core.array_algos import (
masked_accumulations,
masked_reductions,
)
from pandas.core.array_algos.quantile import quantile_with_mask
from pandas.core.arraylike import OpsMixin
from pandas.core.arrays import ExtensionArray
Expand Down Expand Up @@ -1335,3 +1338,14 @@ def all(self, *, skipna: bool = True, **kwargs):
return result
else:
return self.dtype.na_value

def _accumulate(
self, name: str, *, skipna: bool = True, **kwargs
) -> BaseMaskedArray:
data = self._data
mask = self._mask

op = getattr(masked_accumulations, name)
data, mask = op(data, mask, skipna=skipna, **kwargs)

return type(self)(data, mask, copy=False)
16 changes: 16 additions & 0 deletions pandas/core/arrays/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -410,6 +410,22 @@ def std(
return self._box_func(result)
return self._from_backing_data(result)

# ----------------------------------------------------------------
# Accumulations

def _accumulate(self, name: str, *, skipna: bool = True, **kwargs):

data = self._data.copy()

if name in {"cumsum", "cumprod"}:
func = np.cumsum if name == "cumsum" else np.cumprod
result = cast(np.ndarray, nanops.na_accum_func(data, func, skipna=skipna))

return type(self)._simple_new(result, freq=None, dtype=self.dtype)

else:
return super()._accumulate(name, skipna=skipna, **kwargs)

# ----------------------------------------------------------------
# Rendering Methods

Expand Down
6 changes: 5 additions & 1 deletion pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -10859,7 +10859,11 @@ def _accum_func(
def block_accum_func(blk_values):
values = blk_values.T if hasattr(blk_values, "T") else blk_values

result = nanops.na_accum_func(values, func, skipna=skipna)
result: np.ndarray | ExtensionArray
if isinstance(values, ExtensionArray):
result = values._accumulate(name, skipna=skipna, **kwargs)
else:
result = nanops.na_accum_func(values, func, skipna=skipna)

result = result.T if hasattr(result, "T") else result
return result
Expand Down
4 changes: 4 additions & 0 deletions pandas/tests/extension/base/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ class TestMyDtype(BaseDtypeTests):
``assert_series_equal`` on your base test class.

"""
from pandas.tests.extension.base.accumulate import ( # noqa
BaseNoAccumulateTests,
BaseNumericAccumulateTests,
)
from pandas.tests.extension.base.casting import BaseCastingTests # noqa
from pandas.tests.extension.base.constructors import BaseConstructorsTests # noqa
from pandas.tests.extension.base.dim2 import ( # noqa
Expand Down
43 changes: 43 additions & 0 deletions pandas/tests/extension/base/accumulate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import pytest

import pandas as pd
from pandas.tests.extension.base.base import BaseExtensionTests


class BaseAccumulateTests(BaseExtensionTests):
"""
Accumulation specific tests. Generally these only
make sense for numeric/boolean operations.
"""

def check_accumulate(self, s, op_name, skipna):
result = getattr(s, op_name)(skipna=skipna)

if result.dtype == pd.Float32Dtype() and op_name == "cumprod" and skipna:
pytest.skip(
f"Float32 precision lead to large differences with op {op_name} "
f"and skipna={skipna}"
)

expected = getattr(s.astype("float64"), op_name)(skipna=skipna)
self.assert_series_equal(result, expected, check_dtype=False)


class BaseNoAccumulateTests(BaseAccumulateTests):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find this pattern really weird (xref #44742), is there a way to do this with just one class?

Copy link
Member Author

@phofl phofl Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined the classes. Have to overwrite the tests that should not get executed now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

"""we don't define any accumulations"""

@pytest.mark.parametrize("skipna", [True, False])
def test_accumulate_series_numeric(self, data, all_numeric_accumulations, skipna):
op_name = all_numeric_accumulations
s = pd.Series(data)

with pytest.raises(NotImplementedError):
getattr(s, op_name)(skipna=skipna)


class BaseNumericAccumulateTests(BaseAccumulateTests):
@pytest.mark.parametrize("skipna", [True, False])
def test_accumulate_series(self, data, all_numeric_accumulations, skipna):
op_name = all_numeric_accumulations
s = pd.Series(data)
self.check_accumulate(s, op_name, skipna)
11 changes: 11 additions & 0 deletions pandas/tests/extension/test_boolean.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
import numpy as np
import pytest

from pandas.core.dtypes.common import is_bool_dtype

import pandas as pd
import pandas._testing as tm
from pandas.core.arrays.boolean import BooleanDtype
Expand Down Expand Up @@ -393,6 +395,15 @@ class TestUnaryOps(base.BaseUnaryOpsTests):
pass


class TestNumericAccumulation(base.BaseNumericAccumulateTests):
def check_accumulate(self, s, op_name, skipna):
result = getattr(s, op_name)(skipna=skipna)
expected = getattr(pd.Series(s.astype("float64")), op_name)(skipna=skipna)
tm.assert_series_equal(result, expected, check_dtype=False)
if op_name in ("cummin", "cummax"):
assert is_bool_dtype(result)


class TestParsing(base.BaseParsingTests):
pass

Expand Down
4 changes: 4 additions & 0 deletions pandas/tests/extension/test_categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,10 @@ class TestReduce(base.BaseNoReduceTests):
pass


class TestAccumulate(base.BaseNoAccumulateTests):
pass


class TestMethods(base.BaseMethodsTests):
@pytest.mark.xfail(reason="Unobserved categories included")
def test_value_counts(self, all_data, dropna):
Expand Down
4 changes: 4 additions & 0 deletions pandas/tests/extension/test_floating.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,3 +217,7 @@ class TestParsing(base.BaseParsingTests):
@pytest.mark.filterwarnings("ignore:overflow encountered in reduce:RuntimeWarning")
class Test2DCompat(base.Dim2CompatTests):
pass


class TestNumericAccumulation(base.BaseNumericAccumulateTests):
pass
Loading