BUG/TST: Include sem & count in all_numeric_reductions #49759

mroeschke · 2022-11-17T23:24:08Z

The sem bug with Arrow types could be backported to 1.5.x but given the larger testing changes IMO it's fine for 2.0

…uction

jbrockmendel · 2022-11-22T19:19:56Z

pandas/core/arrays/arrow/array.py

-                numerator = pc.stddev(data, skip_nulls=skipna, **kwargs)
-                denominator = pc.sqrt_checked(
-                    pc.subtract_checked(
-                        pc.count(self._data, skip_nulls=skipna), kwargs["ddof"]


is kwargs['ddof'] not relevant here?

Yeah I suppose this only applies in the stddev call.

When comparing to ser.astype("Float64").sem() the results were off, and removing kwargs["ddof"] aligned this pyarrow sem to nullable dtype sem.

the test changes all look good. this change i dont know enough to form an opinion on. if youre confident its right then LGTM. otherwise lets find an appropriate person to double-check

I think based on the nanops.nansem implementation, I'm fairly sure these align

pandas/pandas/core/nanops.py

Lines 1033 to 1042 in 3fffb6d

nanvar(values, axis=axis, skipna=skipna, ddof=ddof, mask=mask)

mask = _maybe_get_mask(values, skipna, mask)

if not is_float_dtype(values.dtype):

values = values.astype("f8")

count, _ = _get_counts_nanvar(values.shape, mask, axis, ddof, values.dtype)

var = nanvar(values, axis=axis, skipna=skipna, ddof=ddof)

return np.sqrt(var) / np.sqrt(count)

mroeschke · 2022-11-28T22:55:44Z

Going to push this one in since the pyarrow sem implementation matches the nanops.sem one. If both are wrong at least we have a test now that said both should be aligned.

mroeschke added 3 commits November 11, 2022 14:03

CLN: Fixture reduction

aa2115a

Merge remote-tracking branch 'upstream/main' into tst/cln/fixture_red…

601710b

…uction

BUG/TST: Include sem & count in all_numeric_reductions

d013fa8

mroeschke added Testing pandas testing functions or related to the test suite Numeric Operations Arithmetic, Comparison, and Logical operations Arrow pyarrow functionality labels Nov 17, 2022

mroeschke added 3 commits November 17, 2022 18:22

Add xfails

080cb19

Merge remote-tracking branch 'upstream/main' into tst/cln/fixture_red…

61b80cc

…uction

Make more generic, and fix whatsnew

157c35c

mroeschke added this to the 2.0 milestone Nov 18, 2022

mroeschke added 4 commits November 18, 2022 15:45

Merge remote-tracking branch 'upstream/main' into tst/cln/fixture_red…

da449b2

…uction

Merge remote-tracking branch 'upstream/main' into tst/cln/fixture_red…

aa7b5d3

…uction

Fix commment typo

623ad0e

Merge remote-tracking branch 'upstream/main' into tst/cln/fixture_red…

5532054

…uction

jbrockmendel reviewed Nov 22, 2022

View reviewed changes

mroeschke merged commit 8b227f3 into pandas-dev:main Nov 28, 2022

mroeschke deleted the tst/cln/fixture_reduction branch November 28, 2022 22:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG/TST: Include sem & count in all_numeric_reductions #49759

BUG/TST: Include sem & count in all_numeric_reductions #49759

mroeschke commented Nov 17, 2022

jbrockmendel Nov 22, 2022

mroeschke Nov 22, 2022

jbrockmendel Nov 23, 2022

mroeschke Nov 23, 2022

mroeschke commented Nov 28, 2022

	nanvar(values, axis=axis, skipna=skipna, ddof=ddof, mask=mask)

	mask = _maybe_get_mask(values, skipna, mask)
	if not is_float_dtype(values.dtype):
	values = values.astype("f8")

	count, _ = _get_counts_nanvar(values.shape, mask, axis, ddof, values.dtype)
	var = nanvar(values, axis=axis, skipna=skipna, ddof=ddof)

	return np.sqrt(var) / np.sqrt(count)

BUG/TST: Include sem & count in all_numeric_reductions #49759

BUG/TST: Include sem & count in all_numeric_reductions #49759

Conversation

mroeschke commented Nov 17, 2022

jbrockmendel Nov 22, 2022

Choose a reason for hiding this comment

mroeschke Nov 22, 2022

Choose a reason for hiding this comment

jbrockmendel Nov 23, 2022

Choose a reason for hiding this comment

mroeschke Nov 23, 2022

Choose a reason for hiding this comment

mroeschke commented Nov 28, 2022