-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG/TST: Include sem & count in all_numeric_reductions #49759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/TST: Include sem & count in all_numeric_reductions #49759
Conversation
numerator = pc.stddev(data, skip_nulls=skipna, **kwargs) | ||
denominator = pc.sqrt_checked( | ||
pc.subtract_checked( | ||
pc.count(self._data, skip_nulls=skipna), kwargs["ddof"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is kwargs['ddof']
not relevant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I suppose this only applies in the stddev
call.
When comparing to ser.astype("Float64").sem()
the results were off, and removing kwargs["ddof"]
aligned this pyarrow sem to nullable dtype sem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test changes all look good. this change i dont know enough to form an opinion on. if youre confident its right then LGTM. otherwise lets find an appropriate person to double-check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think based on the nanops.nansem
implementation, I'm fairly sure these align
Lines 1033 to 1042 in 3fffb6d
nanvar(values, axis=axis, skipna=skipna, ddof=ddof, mask=mask) | |
mask = _maybe_get_mask(values, skipna, mask) | |
if not is_float_dtype(values.dtype): | |
values = values.astype("f8") | |
count, _ = _get_counts_nanvar(values.shape, mask, axis, ddof, values.dtype) | |
var = nanvar(values, axis=axis, skipna=skipna, ddof=ddof) | |
return np.sqrt(var) / np.sqrt(count) |
Going to push this one in since the pyarrow sem implementation matches the |
The
sem
bug with Arrow types could be backported to 1.5.x but given the larger testing changes IMO it's fine for 2.0