-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: masked ops for reductions (sum) #30982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
9795f35
7fb1f88
cd26920
28cd331
6298fbd
735a741
6df454f
5eb48d6
3ef1331
d2230fd
19ac821
2776436
68b4dc2
18d5bfa
457efb1
4df858f
f5120db
76c5149
476f768
b2162dc
d4746f5
d9c2cbf
f8705c2
1a43e10
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
""" | ||
masked_reductions.py is for reduction algorithms using a mask-based approach | ||
for missing values. | ||
""" | ||
|
||
import numpy as np | ||
|
||
from pandas._libs import missing as libmissing | ||
from pandas.compat.numpy import _np_version_under1p17 | ||
|
||
from pandas.core.nanops import check_below_min_count | ||
|
||
|
||
def sum( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we avoid overriding a builtin-in name? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Do we care? In practice this is only used as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also don't have a problem with renaming it to eg There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are no nan's involved here ;), that's exactly the difference with nanops There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ha, sounds good |
||
values: np.ndarray, mask: np.ndarray, skipna: bool = True, min_count: int = 0, | ||
): | ||
""" | ||
Sum for 1D masked array. | ||
|
||
Parameters | ||
---------- | ||
values : np.ndarray | ||
Numpy array with the values (can be of any dtype that support the | ||
operation). | ||
mask : np.ndarray | ||
Boolean numpy array (True values indicate missing values). | ||
skipna : bool, default True | ||
simonjayhawkins marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Whether to skip NA. | ||
min_count : int, default 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. required >=0? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a count, so yes |
||
The required number of valid values to perform the operation. If fewer than | ||
``min_count`` non-NA values are present the result will be NA. | ||
""" | ||
if not skipna: | ||
if mask.any(): | ||
return libmissing.NA | ||
else: | ||
if check_below_min_count(values.shape, None, min_count): | ||
return libmissing.NA | ||
return np.sum(values) | ||
else: | ||
if check_below_min_count(values.shape, mask, min_count): | ||
return libmissing.NA | ||
|
||
if _np_version_under1p17: | ||
return np.sum(values[~mask]) | ||
else: | ||
return np.sum(values, where=~mask) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jorisvandenbossche This seems like a useful pattern for other reductions like min and max (maybe a special case is needed if we have a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's indeed the goal of this PR (I just wanted to limit it to a single function for the initial PR). For min/max, I think it will be better to write a separate function in module since they don't use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is
DataFrame.sum
affected?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, since we don't use the EA implementation for reductions in DataFrame reductions, that's my other PR