ENH: Improve numerical stability for window functions skew and kurt #37557

phofl · 2020-11-01T13:51:37Z

closes PERF: Poor numerical stability of rolling_kurt and rolling_skew #6929
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Used the same procedure as in DataFrame.skew()

phofl · 2020-11-01T16:59:21Z

Edit: Have to run asvs them again because of code changes

pep8speaks · 2020-11-01T20:10:34Z

Hello @phofl! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-09 11:16:48 UTC

phofl · 2020-11-01T20:12:22Z

Got somewhat ugly now through the Kahan summation for all parts.
Can not subtract mean for wide range of values, because this would lead to imprecision for the small values. Running asvs now again

phofl · 2020-11-01T20:36:28Z

       before           after         ratio
     [6092dc7d]       [c58be86a]
     <6929~3^2>       <6929>    
+     1.96±0.03ms       3.11±0.3ms     1.59  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'int', 'skew')
+     1.92±0.06ms       3.03±0.3ms     1.58  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'skew')
+     1.92±0.02ms       2.98±0.2ms     1.55  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'int', 'skew')
+     1.84±0.05ms      2.79±0.06ms     1.52  rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'float', 'kurt')
+     2.33±0.07ms       3.49±0.2ms     1.50  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'int', 'skew')
+     2.37±0.06ms       3.55±0.3ms     1.50  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'int', 'skew')
+     1.86±0.03ms       2.76±0.1ms     1.48  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'float', 'kurt')
+     1.84±0.06ms      2.69±0.04ms     1.46  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'skew')
+     1.82±0.04ms      2.65±0.04ms     1.46  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'skew')
+     2.32±0.06ms       3.35±0.3ms     1.45  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'skew')
+     1.36±0.02ms      1.96±0.08ms     1.44  rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'int', 'kurt')
+     1.84±0.02ms      2.62±0.05ms     1.42  rolling.Methods.time_rolling('Series', 10, 'int', 'skew')
+     2.15±0.01ms      3.05±0.05ms     1.42  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'float', 'skew')
+     2.13±0.01ms       3.02±0.1ms     1.42  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'float', 'kurt')
+     1.41±0.02ms      1.99±0.07ms     1.42  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'int', 'kurt')
+     2.35±0.02ms      3.34±0.04ms     1.42  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'float', 'kurt')
+     1.89±0.02ms      2.66±0.09ms     1.41  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'float', 'skew')
+     1.53±0.04ms      2.15±0.05ms     1.40  rolling.ExpandingMethods.time_expanding('Series', 'int', 'skew')
+     1.67±0.04ms      2.33±0.05ms     1.40  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'int', 'kurt')
+     2.00±0.01ms      2.80±0.02ms     1.40  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'kurt')
+     1.41±0.03ms      1.96±0.09ms     1.39  rolling.ExpandingMethods.time_expanding('Series', 'float', 'skew')
+     2.02±0.07ms      2.80±0.04ms     1.38  rolling.Methods.time_rolling('Series', 10, 'int', 'kurt')
+     1.81±0.01ms      2.49±0.02ms     1.37  rolling.Methods.time_rolling('Series', 10, 'float', 'skew')
+     2.11±0.06ms      2.90±0.05ms     1.37  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'skew')
+     2.10±0.06ms      2.87±0.05ms     1.37  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'skew')
+     2.29±0.04ms      3.12±0.03ms     1.36  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'skew')
+     2.28±0.05ms      3.10±0.01ms     1.36  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'kurt')
+     2.06±0.05ms      2.81±0.03ms     1.36  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'kurt')
+     2.12±0.01ms      2.88±0.04ms     1.36  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'float', 'kurt')
+     2.42±0.02ms      3.29±0.02ms     1.36  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'kurt')
+     2.07±0.05ms      2.78±0.02ms     1.34  rolling.Methods.time_rolling('Series', 1000, 'int', 'kurt')
+     2.27±0.02ms      3.04±0.02ms     1.34  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'skew')
+     1.57±0.01ms      2.11±0.03ms     1.34  rolling.ExpandingMethods.time_expanding('Series', 'int', 'kurt')
+     2.03±0.02ms      2.71±0.05ms     1.34  rolling.Methods.time_rolling('Series', 10, 'float', 'kurt')
+     2.39±0.06ms      3.18±0.07ms     1.34  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'kurt')
+     2.13±0.02ms       2.84±0.1ms     1.33  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'int', 'kurt')
+     2.16±0.03ms      2.87±0.05ms     1.33  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'skew')
+     1.74±0.02ms      2.30±0.01ms     1.33  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'skew')
+     1.92±0.03ms      2.54±0.02ms     1.33  rolling.Methods.time_rolling('Series', 1000, 'int', 'skew')
+     2.27±0.03ms      2.99±0.09ms     1.32  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'kurt')
+     2.23±0.03ms      2.94±0.06ms     1.32  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'int', 'kurt')
+     2.33±0.01ms      3.08±0.06ms     1.32  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'kurt')
+     1.88±0.03ms      2.47±0.03ms     1.31  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'skew')
+     2.13±0.06ms      2.79±0.04ms     1.31  rolling.ForwardWindowMethods.time_rolling('DataFrame', 10, 'float', 'kurt')
+     1.74±0.04ms      2.27±0.09ms     1.30  rolling.ForwardWindowMethods.time_rolling('DataFrame', 10, 'int', 'kurt')
+     2.30±0.02ms      2.98±0.02ms     1.30  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'skew')
+     2.45±0.02ms      3.15±0.04ms     1.28  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'kurt')
+     1.94±0.02ms      2.47±0.03ms     1.28  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'kurt')
+     1.54±0.01ms       1.91±0.2ms     1.24  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'lower')
+     4.03±0.03ms       5.00±0.4ms     1.24  rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'int', 'std')
+     1.49±0.01ms       1.84±0.2ms     1.23  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'higher')
+     1.53±0.01ms       1.82±0.2ms     1.19  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'nearest')
+     1.52±0.01ms       1.81±0.1ms     1.19  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'midpoint')
+     1.50±0.04ms      1.74±0.02ms     1.16  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'midpoint')
+     1.44±0.01ms      1.64±0.02ms     1.14  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'lower')
+     1.61±0.03ms      1.83±0.04ms     1.14  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'int', 'max')
+     2.93±0.01ms      3.33±0.04ms     1.14  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 1, 'higher')
+     1.82±0.02ms      2.06±0.01ms     1.13  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'nearest')
+     1.78±0.02ms      2.02±0.01ms     1.13  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'nearest')
+     1.45±0.04ms      1.64±0.02ms     1.13  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'linear')
+     1.60±0.03ms      1.81±0.01ms     1.13  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'int', 'min')
+     2.91±0.02ms       3.29±0.2ms     1.13  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 1, 'linear')
+     1.31±0.02ms      1.47±0.06ms     1.12  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'int', 'min')
+     1.82±0.01ms      2.04±0.02ms     1.12  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'linear')
+     1.46±0.02ms      1.64±0.02ms     1.12  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'nearest')
+     1.55±0.03ms      1.72±0.02ms     1.11  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'nearest')
+     1.52±0.02ms      1.69±0.01ms     1.11  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'higher')
+     1.52±0.04ms      1.69±0.03ms     1.11  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'linear')
+     1.84±0.02ms      2.04±0.02ms     1.11  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'lower')
+     1.83±0.02ms      2.03±0.02ms     1.11  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'higher')
+     1.51±0.03ms      1.67±0.03ms     1.11  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'higher')
+     2.67±0.03ms      2.95±0.06ms     1.11  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'float', 'max')
+     2.92±0.01ms      3.23±0.06ms     1.11  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 1, 'midpoint')
-     1.78±0.03ms      1.61±0.03ms     0.91  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'int', 'mean')
-     1.66±0.01ms      1.51±0.02ms     0.91  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'mean')
-       984±100μs          887±6μs     0.90  rolling.EWMMethods.time_ewm_times('Series', 10, 'float', 'mean')
-     1.63±0.02ms      1.44±0.02ms     0.88  rolling.Methods.time_rolling('Series', 10, 'int', 'mean')
-     1.56±0.02ms      1.36±0.01ms     0.87  rolling.Methods.time_rolling('Series', 1000, 'float', 'mean')
-     1.60±0.05ms      1.37±0.03ms     0.86  rolling.Methods.time_rolling('Series', 10, 'float', 'mean')
-      1.77±0.1ms      1.49±0.02ms     0.84  rolling.Methods.time_rolling('Series', 1000, 'int', 'mean')
-      1.04±0.1ms          871±3μs     0.84  rolling.EWMMethods.time_ewm_times('Series', 10, 'float', 'std')
-      1.18±0.1ms         956±10μs     0.81  rolling.EWMMethods.time_ewm_times('Series', 10, 'int', 'std')

Would have expected a bigger hit.

jreback · 2020-11-02T13:52:16Z

pandas/_libs/window/aggregations.pyx


+    with nogil:


you can leave the original nogil right? e.g. no reason to do it twice?

The subtraction of the mean does not wirk with nogil, that is the reason did it there again

umm why? this is array - scalar right?

Thought so too, but got an error. Will try again

Hmmm, getting the following error without gil:

Error compiling Cython file: ------------------------------------------------------------ ... nobs_mean += 1 sum_val += val mean_val = sum_val / nobs_mean # Other cases would lead to imprecision for smallest values if min_val - mean_val > -1e5: values = values - round(mean_val) ^ ------------------------------------------------------------ pandas/_libs/window/aggregations.pyx:545:36: Converting to Python object not allowed without gil Traceback (most recent call last): File "setup.py", line 790, in <module> setup_package() File "setup.py", line 760, in setup_package ext_modules=maybe_cythonize(extensions, compiler_directives=directives), File "setup.py", line 539, in maybe_cythonize return cythonize(extensions, *args, **kwargs) File "/home/developer/anaconda3/envs/omi_reports/pandas-dev/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1102, in cythonize cythonize_one(*args) File "/home/developer/anaconda3/envs/omi_reports/pandas-dev/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1225, in cythonize_one raise CompileError(None, pyx_file)

Hm no, mean_value is a float. This does not work without round either. Maybe the broadcasting is the problem here?

yeah maybe have to use np.broadcast_to

Found a solution for the round issue. Used a loop for the subtraction, because np.broadcast_to did not work either. No matter this, I think the loop is the most intuitive solution

from libc.math cimport round?

That works, thanks very much.

jreback · 2020-11-02T13:53:34Z

pandas/_libs/window/aggregations.pyx

+            if notnan(val):
+                nobs_mean += 1
+                sum_val += val
+    mean_val = sum_val / nobs_mean


umm, why are you adding this?

The issues mentioned above describes a use case. where the series is shifted by 5000. In this case, if we do not subtract the mean, The decimal places will be so small, that they will be ignored. Leading to the numerical imprecision described in the issue.

Kahan summation only solves issues, when the range of the values between the windows is really big. Could post a more precise and explicit example, if this helps

i think i understand the issue, but am unclear why you are not always just subtracting the mean rather than conditionally.

Aaah sorry. Misunderstood you.

In case of something like [1e12, 0.023, 0.04565, 0.343545, 0.343434] and subtract the mean from the complete series, we lose precision for the second window starting at 0.023, because this abs value gets really big too

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/tests/window/test_rolling.py

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

� Conflicts: � pandas/tests/window/test_expanding.py

jreback · 2020-11-09T02:50:56Z

looks good, can you merge master and ping on green (also maybe want to re-run the perf tests to give a good read)

phofl · 2020-11-09T11:54:51Z

asvs:

       before           after         ratio
     [82cd86c1]       [50ea50a7]
     <25165^2>        <6929>    
+      2.39±0.1ms         4.20±1ms     1.76  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'int', 'kurt')
+     2.16±0.09ms       3.76±0.8ms     1.74  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'float', 'skew')
+      2.22±0.1ms       3.77±0.9ms     1.70  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'int', 'skew')
+     2.36±0.09ms       4.00±0.6ms     1.69  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'float', 'kurt')
+     2.22±0.04ms       3.76±0.9ms     1.69  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'skew')
+     1.23±0.02ms      2.06±0.03ms     1.67  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'int', 'kurt')
+     2.41±0.05ms       4.03±0.7ms     1.67  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'kurt')
+      1.37±0.2ms       2.29±0.2ms     1.67  rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'int', 'kurt')
+     2.23±0.07ms       3.66±0.6ms     1.64  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'kurt')
+     2.39±0.02ms       3.88±0.6ms     1.62  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'kurt')
+     2.20±0.01ms       3.56±0.2ms     1.62  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'kurt')
+     1.91±0.02ms      3.04±0.04ms     1.59  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'float', 'kurt')
+     1.75±0.03ms      2.77±0.01ms     1.58  rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'float', 'kurt')
+     1.86±0.03ms      2.94±0.03ms     1.58  rolling.Methods.time_rolling('Series', 1000, 'float', 'kurt')
+     1.55±0.04ms      2.43±0.03ms     1.57  rolling.ForwardWindowMethods.time_rolling('DataFrame', 10, 'int', 'kurt')
+     2.18±0.02ms       3.42±0.5ms     1.57  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'skew')
+     2.00±0.02ms      3.12±0.03ms     1.56  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'int', 'kurt')
+     1.88±0.02ms      2.92±0.03ms     1.56  rolling.Methods.time_rolling('Series', 10, 'float', 'kurt')
+     1.54±0.02ms      2.40±0.02ms     1.56  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'int', 'kurt')
+     1.96±0.01ms      3.04±0.08ms     1.55  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'kurt')
+     2.11±0.03ms      3.25±0.04ms     1.54  rolling.ForwardWindowMethods.time_rolling('DataFrame', 10, 'float', 'kurt')
+      1.82±0.2ms      2.79±0.03ms     1.53  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'float', 'kurt')
+        2.00±0ms      3.06±0.06ms     1.53  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'int', 'kurt')
+     1.42±0.03ms      2.16±0.02ms     1.52  rolling.ExpandingMethods.time_expanding('Series', 'int', 'kurt')
+     1.91±0.02ms      2.90±0.02ms     1.52  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'kurt')
+     1.99±0.01ms      3.01±0.02ms     1.52  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'kurt')
+     2.07±0.03ms      3.12±0.04ms     1.51  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'float', 'kurt')
+     1.71±0.02ms      2.57±0.09ms     1.50  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'skew')
+     1.98±0.03ms      2.97±0.03ms     1.50  rolling.Methods.time_rolling('Series', 1000, 'int', 'kurt')
+     2.27±0.03ms      3.38±0.03ms     1.49  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'kurt')
+     1.65±0.02ms      2.45±0.02ms     1.48  rolling.Methods.time_rolling('Series', 1000, 'float', 'skew')
+     1.71±0.02ms      2.51±0.02ms     1.47  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'float', 'skew')
+     2.29±0.03ms      3.36±0.06ms     1.47  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'kurt')
+     1.40±0.02ms      2.05±0.03ms     1.47  rolling.ExpandingMethods.time_expanding('Series', 'float', 'kurt')
+     2.42±0.03ms      3.53±0.05ms     1.46  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'int', 'kurt')
+     2.19±0.01ms      3.19±0.04ms     1.46  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'kurt')
+        1.80±0ms      2.60±0.07ms     1.44  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'kurt')
+     1.71±0.03ms      2.47±0.04ms     1.44  rolling.Methods.time_rolling('Series', 10, 'int', 'skew')
+     1.97±0.02ms      2.84±0.04ms     1.44  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'skew')
+     1.73±0.02ms      2.48±0.02ms     1.43  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'skew')
+     1.81±0.01ms      2.59±0.03ms     1.43  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'int', 'skew')
+     1.65±0.03ms      2.35±0.03ms     1.43  rolling.Methods.time_rolling('Series', 10, 'float', 'skew')
+     1.81±0.03ms      2.56±0.05ms     1.42  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'skew')
+     2.56±0.02ms         3.63±1ms     1.41  rolling.Quantile.time_quantile('Series', 1000, 'float', 1, 'lower')
+     2.04±0.01ms      2.88±0.02ms     1.41  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'skew')
+     1.72±0.01ms      2.42±0.04ms     1.41  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'kurt')
+     2.11±0.03ms      2.96±0.05ms     1.40  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'skew')
+     1.75±0.01ms      2.45±0.05ms     1.40  rolling.Methods.time_rolling('Series', 1000, 'int', 'skew')
+     1.35±0.01ms      1.87±0.03ms     1.39  rolling.ExpandingMethods.time_expanding('Series', 'int', 'skew')
+     1.84±0.03ms      2.53±0.03ms     1.38  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'int', 'skew')
+     2.61±0.03ms       3.58±0.9ms     1.37  rolling.Quantile.time_quantile('Series', 1000, 'float', 1, 'higher')
+     2.00±0.03ms      2.73±0.06ms     1.37  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'skew')
+     2.04±0.06ms       2.76±0.1ms     1.35  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'skew')
+     2.24±0.02ms      3.02±0.05ms     1.35  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'int', 'skew')
+     1.32±0.02ms      1.77±0.01ms     1.35  rolling.ExpandingMethods.time_expanding('Series', 'float', 'skew')
+     1.61±0.01ms      2.17±0.02ms     1.35  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'skew')
+     1.52±0.02ms       2.05±0.3ms     1.35  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'higher')
+     1.74±0.02ms      2.29±0.04ms     1.32  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'skew')
+     3.13±0.05ms       4.06±0.8ms     1.30  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'max')
+     2.62±0.04ms       3.27±0.3ms     1.25  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'max')
+     1.53±0.02ms      1.90±0.08ms     1.24  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'midpoint')
+     1.53±0.02ms      1.84±0.06ms     1.21  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'higher')
+     1.54±0.02ms      1.81±0.03ms     1.18  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'linear')
+     3.06±0.03ms       3.58±0.4ms     1.17  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'max')
+     1.48±0.02ms      1.72±0.01ms     1.16  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'lower')
+     1.33±0.01ms      1.54±0.01ms     1.16  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'int', 'max')
+     1.48±0.02ms         1.71±0ms     1.16  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'higher')
+     2.48±0.03ms      2.87±0.07ms     1.16  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'max')
+     1.84±0.02ms      2.12±0.02ms     1.15  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'midpoint')
+     1.55±0.02ms      1.78±0.02ms     1.15  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'lower')
+     1.50±0.02ms      1.72±0.04ms     1.14  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'linear')
+     1.85±0.02ms      2.10±0.04ms     1.14  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'nearest')
+     1.50±0.03ms      1.71±0.04ms     1.14  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'nearest')
+     1.86±0.05ms      2.11±0.02ms     1.14  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'higher')
+     1.55±0.02ms      1.76±0.01ms     1.13  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'nearest')
+     1.52±0.02ms      1.73±0.01ms     1.13  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'midpoint')
+     1.56±0.02ms      1.76±0.02ms     1.13  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'nearest')
+     1.34±0.02ms      1.52±0.02ms     1.13  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'int', 'min')
+     1.83±0.02ms      2.06±0.04ms     1.13  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'nearest')
+     1.83±0.01ms      2.07±0.02ms     1.13  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'linear')
+     1.85±0.03ms      2.08±0.05ms     1.13  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'lower')
+     1.86±0.02ms      2.10±0.02ms     1.13  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 1, 'nearest')
+     1.90±0.02ms      2.14±0.07ms     1.13  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0, 'midpoint')
+     1.83±0.01ms      2.06±0.04ms     1.13  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0, 'linear')
+     2.67±0.02ms       2.98±0.1ms     1.12  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'min')
+     1.87±0.06ms      2.09±0.03ms     1.12  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 1, 'linear')
+     1.87±0.01ms      2.07±0.02ms     1.11  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'linear')
+     2.12±0.02ms      2.36±0.01ms     1.11  rolling.ExpandingMethods.time_expanding('Series', 'float', 'max')
+     2.43±0.07ms      2.70±0.01ms     1.11  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'float', 'max')
+     1.74±0.02ms      1.93±0.01ms     1.11  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'int', 'max')
+     1.83±0.02ms      2.03±0.03ms     1.11  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'lower')
+     1.57±0.03ms      1.74±0.01ms     1.11  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'linear')
+     1.86±0.03ms      2.05±0.02ms     1.10  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0, 'nearest')
+     2.62±0.02ms      2.89±0.07ms     1.10  rolling.Methods.time_rolling('Series', 10, 'float', 'max')
+     2.75±0.03ms      3.03±0.01ms     1.10  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'float', 'max')

phofl · 2020-11-09T16:52:41Z

cc @jreback pipelines are green

jreback · 2020-11-09T21:20:14Z

thanks @phofl keep em coming!

kaixiongg · 2024-09-03T06:59:52Z

Why not consider implementing it using the Welford method?

phofl added 2 commits November 1, 2020 14:49

Improve numerical stability for window functions skew and kurt

d4b1a9b

Fix typing

c634539

phofl added the Window rolling, ewma, expanding label Nov 1, 2020

phofl added 5 commits November 1, 2020 15:21

Fix pattern

bf88a63

Improve numerical stability for mean calculation

e57f97c

Change dtype assignment

a71f645

Improve performance

a1874d8

Merge branch 'master' of https://github.com/pandas-dev/pandas into 6929

288ac22

phofl added 2 commits November 1, 2020 21:09

Implement kahan summation

eb35925

Change whatsnew

480ba26

RUn black

c58be86

phofl added 3 commits November 1, 2020 21:29

Fix pattern

fa460b1

Fix failing test

ce0a9a2

Remove unnecessary division

e643a45

jreback requested changes Nov 2, 2020

View reviewed changes

phofl added 7 commits November 3, 2020 00:54

Merge branch 'master' of https://github.com/pandas-dev/pandas into 6929

5998b7b

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

Merge branch 'master' of https://github.com/pandas-dev/pandas into 6929

7109729

� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/tests/window/test_rolling.py

Hold no gil permanently

bfedac0

Merge branch 'master' of https://github.com/pandas-dev/pandas into 6929

b03d5c1

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

Use round from math

852f3ca

Fix pattern

4688648

Merge branch 'master' of https://github.com/pandas-dev/pandas into 6929

50ea50a

� Conflicts: � pandas/tests/window/test_expanding.py

jreback added this to the 1.2 milestone Nov 9, 2020

jreback approved these changes Nov 9, 2020

View reviewed changes

jreback merged commit 8dd2d95 into pandas-dev:master Nov 9, 2020

phofl deleted the 6929 branch November 9, 2020 21:44

mzeitlin11 mentioned this pull request Jan 2, 2021

BUG: pandas 1.2.0 df.rolling().aggregate('skew') modified original data #38908

Closed

3 tasks

Uh oh!

ENH: Improve numerical stability for window functions skew and kurt #37557

ENH: Improve numerical stability for window functions skew and kurt #37557

Uh oh!

Conversation

phofl commented Nov 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phofl commented Nov 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Nov 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-11-09 11:16:48 UTC

Uh oh!

phofl commented Nov 1, 2020

Uh oh!

phofl commented Nov 1, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 9, 2020

Uh oh!

phofl commented Nov 9, 2020

Uh oh!

phofl commented Nov 9, 2020

Uh oh!

jreback commented Nov 9, 2020

Uh oh!

kaixiongg commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

phofl commented Nov 1, 2020 •

edited

Loading

phofl commented Nov 1, 2020 •

edited

Loading

pep8speaks commented Nov 1, 2020 •

edited

Loading

kaixiongg commented Sep 3, 2024 •

edited

Loading