-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[WIP, ENH] Adds cumulative methods to ea #28509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
datajanko
wants to merge
89
commits into
pandas-dev:master
from
datajanko:28385-add-cumulative-methods-to-EA
Closed
Changes from all commits
Commits
Show all changes
89 commits
Select commit
Hold shift + click to select a range
2897723
Merge pull request #1 from pandas-dev/master
datajanko a54e1b4
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko 10abc0f
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko c2d7592
define accumulation interface for ExtensionArrays
datajanko 2c149c0
reformulate doc string
datajanko 79cea11
creates baseExtension tests for accumulate
datajanko 12a5ca3
adds fixtures for numeric_accumulations
datajanko dc959f4
fixes typos
datajanko bcfb8a8
adds accumulate tests for integer arrays
datajanko 9a8f4ec
fixes typo
datajanko 5d837d9
first implementation of cumsum
datajanko 9e9f0c3
Merge pull request #2 from pandas-dev/master
datajanko a1a1cb2
Merge pull request #3 from pandas-dev/master
datajanko 6d967ad
merges master
datajanko 73363bf
stashed merge conflict
datajanko 0d9a3d5
fixes formatting
datajanko 84a7d81
first green test for integer extension arrays and cumsum
datajanko ce6869d
first passing tests for cummin and cummax
datajanko 3b5d1d8
utilizes na_accum_func
datajanko 0337cb0
removes delegation leftover
datajanko f0722f5
creates running tests
datajanko 99baa1b
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko fa35b14
removes ABCExtensionArray Type hint
datajanko 43fca7c
Merge pull request #4 from pandas-dev/master
datajanko 7bd6378
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko 185510b
removes clutter from generic.py
datajanko 2ef9ebb
removes clutter in _accumulate
datajanko 7d898bd
adds typehints for ExtensionArray and IntegerArray
datajanko 09b42be
delegates the accumulate calls to extension arrays
datajanko af0dd24
removes diff in nanops
datajanko bc9a36a
removes unwanted pattern
datajanko 38454a3
makes output types for sum and prod explicit
datajanko 5ecfa51
makes the base accumulate test more general by not comparing types
datajanko 8d62594
implements accumulation for boolean arrays
datajanko 5f3b624
uses f-string in base.py
datajanko 06d1286
uses blockmanager also for extension arrays
datajanko 7efcb5f
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko ae5f969
merges master
datajanko f7e3f4f
fixes flake8 issues
datajanko aa99927
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko 9cab6d9
merges master
datajanko b3ae864
removes uncommented code
datajanko 52e6486
adds todo for runtime warning
datajanko 99fb664
reuses integer array to accumulate for booleans
datajanko d339250
removes runtimewarning catching
datajanko be6f974
removes TODOs
datajanko a902f4e
adds accumulate to autosummary
datajanko 64afb5b
excludes datetime from propagating to _accumulate
datajanko 1e5d77b
uses pandas.testing instead of pandas.util.testing in accumulate
datajanko c95b490
replaces assert_almost_equal with assert_series_equal
datajanko dc669de
dtypes to lowercase
datajanko 08475a4
lowercase of uint and int64 dtype in _accumulate
datajanko 67fa99a
uses hint of @simonjayhawkins concerning assert series equals
datajanko a36632b
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko b3d3c81
adds whatsnew entry
datajanko f8f6367
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko ad6773d
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko 663c301
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko e17f3a0
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko 8cb66f9
moves changes to 1.2.0
datajanko 4f953cf
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko b33a5df
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko 18ec178
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko 305bdc7
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko 56bfb23
merges master
datajanko 63db854
merges master
datajanko 9c91c55
Merge branch 'master' of https://github.com/pandas-dev/pandas into ma…
datajanko 6ba3ca9
uses na_accum_func
datajanko f2a49b3
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko 386fa39
merges master
datajanko 55de384
delegate to EAs _accumulate function in block mgr
datajanko 6a5b7f8
moves implementation from nanops to masked_accumulations
datajanko 9c63c64
fixes typing annotations in base and masked
datajanko 84dd141
Merge branch 'master' into 28385-add-cumulative-methods-to-EA
datajanko 2f23499
fixes merge error
datajanko a5b30e6
fills na values without nanops
datajanko d22c8a0
fixes incorrect call to cumsum and changes to cumprod
datajanko a5866c7
add _accumulate to boolean
datajanko 8255457
makes tests a lot easier - cumprod tests still fail
datajanko 483b608
adds BaseNumericAccumulation for floating masked array
datajanko 150fd3b
tests no numeric accumulations according to _accumulate interface
datajanko 80e2dc6
uses NotImplementedError in base accumulate function
datajanko dceab99
ensures the fill values are data independent
datajanko 1c14f18
adds accumulation for datetimelikes
datajanko e20501a
Merge branch 'master' of https://github.com/pandas-dev/pandas
datajanko 53147c4
fixes merge conflicts
datajanko 597e978
actually ads datetimelike accumulation algos
datajanko 5ebe8ea
fixes absolute imports
datajanko 32367c0
changes error to catch to adhere to changed implementation
datajanko File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
from typing import Callable | ||
|
||
import numpy as np | ||
|
||
from pandas._libs import iNaT | ||
|
||
from pandas.core.dtypes.missing import isna | ||
|
||
""" | ||
datetimelke_accumulations.py is for accumulations of datetimelike extension arrays | ||
""" | ||
|
||
|
||
def _cum_func( | ||
func: Callable, | ||
values: np.ndarray, | ||
*, | ||
skipna: bool = True, | ||
): | ||
""" | ||
Accumulations for 1D datetimelike arrays. | ||
|
||
Parameters | ||
---------- | ||
func : np.cumsum, np.cumprod, np.maximum.accumulate, np.minimum.accumulate | ||
values : np.ndarray | ||
Numpy array with the values (can be of any dtype that support the | ||
operation). | ||
skipna : bool, default True | ||
Whether to skip NA. | ||
""" | ||
try: | ||
fill_value = { | ||
np.cumprod: 1, | ||
np.maximum.accumulate: np.iinfo(np.int64).min, | ||
np.cumsum: 0, | ||
np.minimum.accumulate: np.iinfo(np.int64).max, | ||
}[func] | ||
except KeyError: | ||
raise ValueError(f"No accumulation for {func} implemented on BaseMaskedArray") | ||
|
||
mask = isna(values) | ||
y = values.view("i8") | ||
y[mask] = fill_value | ||
|
||
if not skipna: | ||
# This is different compared to the recent implementation for datetimelikes | ||
# but is the same as the implementation for masked arrays | ||
mask = np.maximum.accumulate(mask) | ||
|
||
result = func(y) | ||
result[mask] = iNaT | ||
return result | ||
|
||
|
||
def cumsum(values: np.ndarray, *, skipna: bool = True): | ||
return _cum_func(np.cumsum, values, skipna=skipna) | ||
|
||
|
||
def cumprod(values: np.ndarray, *, skipna: bool = True): | ||
return _cum_func(np.cumprod, values, skipna=skipna) | ||
|
||
|
||
def cummin(values: np.ndarray, *, skipna: bool = True): | ||
return _cum_func(np.minimum.accumulate, values, skipna=skipna) | ||
|
||
|
||
def cummax(values: np.ndarray, *, skipna: bool = True): | ||
return _cum_func(np.maximum.accumulate, values, skipna=skipna) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
from typing import Callable | ||
|
||
import numpy as np | ||
|
||
from pandas.core.dtypes.common import ( | ||
is_float_dtype, | ||
is_integer_dtype, | ||
) | ||
|
||
""" | ||
masked_accumulations.py is for accumulation algorithms using a mask-based approach | ||
for missing values. | ||
""" | ||
|
||
|
||
def _cum_func( | ||
func: Callable, | ||
values: np.ndarray, | ||
mask: np.ndarray, | ||
*, | ||
skipna: bool = True, | ||
): | ||
""" | ||
Accumulations for 1D masked array. | ||
|
||
Parameters | ||
---------- | ||
func : np.cumsum, np.cumprod, np.maximum.accumulate, np.minimum.accumulate | ||
values : np.ndarray | ||
Numpy array with the values (can be of any dtype that support the | ||
operation). | ||
mask : np.ndarray | ||
Boolean numpy array (True values indicate missing values). | ||
skipna : bool, default True | ||
Whether to skip NA. | ||
""" | ||
dtype_info = None | ||
if is_float_dtype(values): | ||
dtype_info = np.finfo(values.dtype.type) | ||
elif is_integer_dtype(values): | ||
dtype_info = np.iinfo(values.dtype.type) | ||
else: | ||
raise NotImplementedError( | ||
f"No masked accumulation defined for dtype {values.dtype.type}" | ||
) | ||
try: | ||
fill_value = { | ||
np.cumprod: 1, | ||
np.maximum.accumulate: dtype_info.min, | ||
np.cumsum: 0, | ||
np.minimum.accumulate: dtype_info.max, | ||
}[func] | ||
except KeyError: | ||
raise ValueError(f"No accumulation for {func} implemented on BaseMaskedArray") | ||
|
||
values[mask] = fill_value | ||
|
||
if not skipna: | ||
mask = np.maximum.accumulate(mask) | ||
|
||
values = func(values) | ||
return values, mask | ||
|
||
|
||
def cumsum(values: np.ndarray, mask: np.ndarray, *, skipna: bool = True): | ||
return _cum_func(np.cumsum, values, mask, skipna=skipna) | ||
|
||
|
||
def cumprod(values: np.ndarray, mask: np.ndarray, *, skipna: bool = True): | ||
return _cum_func(np.cumprod, values, mask, skipna=skipna) | ||
|
||
|
||
def cummin(values: np.ndarray, mask: np.ndarray, *, skipna: bool = True): | ||
return _cum_func(np.minimum.accumulate, values, mask, skipna=skipna) | ||
|
||
|
||
def cummax(values: np.ndarray, mask: np.ndarray, *, skipna: bool = True): | ||
return _cum_func(np.maximum.accumulate, values, mask, skipna=skipna) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be for numpy compatibility (axis, dtype, out).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your expectations adding this? Do you purely want to have it (for this ticket) as accessor, which we'll ignore? If not, I'd guess the impact of axis is None, but dtype might be interesting. I.e for cumsum and integer dtypes, we could also provide the target output type, so not defaulting to (U)Int64. But I don't know if this should be part of this issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea would be for something like
np.cumsum(pd.array([1, 2]))
to return an IntegerArray. I'm not sure what all is required for that to work.