Skip to content

ENH: Improve numerical stability for window functions skew and kurt #37557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Nov 9, 2020

Conversation

phofl
Copy link
Member

@phofl phofl commented Nov 1, 2020

Used the same procedure as in DataFrame.skew()

@phofl phofl added the Window rolling, ewma, expanding label Nov 1, 2020
@phofl
Copy link
Member Author

phofl commented Nov 1, 2020

Edit: Have to run asvs them again because of code changes

@pep8speaks
Copy link

pep8speaks commented Nov 1, 2020

Hello @phofl! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-09 11:16:48 UTC

@phofl
Copy link
Member Author

phofl commented Nov 1, 2020

Got somewhat ugly now through the Kahan summation for all parts.
Can not subtract mean for wide range of values, because this would lead to imprecision for the small values. Running asvs now again

@phofl
Copy link
Member Author

phofl commented Nov 1, 2020

       before           after         ratio
     [6092dc7d]       [c58be86a]
     <6929~3^2>       <6929>    
+     1.96±0.03ms       3.11±0.3ms     1.59  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'int', 'skew')
+     1.92±0.06ms       3.03±0.3ms     1.58  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'skew')
+     1.92±0.02ms       2.98±0.2ms     1.55  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'int', 'skew')
+     1.84±0.05ms      2.79±0.06ms     1.52  rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'float', 'kurt')
+     2.33±0.07ms       3.49±0.2ms     1.50  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'int', 'skew')
+     2.37±0.06ms       3.55±0.3ms     1.50  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'int', 'skew')
+     1.86±0.03ms       2.76±0.1ms     1.48  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'float', 'kurt')
+     1.84±0.06ms      2.69±0.04ms     1.46  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'skew')
+     1.82±0.04ms      2.65±0.04ms     1.46  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'skew')
+     2.32±0.06ms       3.35±0.3ms     1.45  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'skew')
+     1.36±0.02ms      1.96±0.08ms     1.44  rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'int', 'kurt')
+     1.84±0.02ms      2.62±0.05ms     1.42  rolling.Methods.time_rolling('Series', 10, 'int', 'skew')
+     2.15±0.01ms      3.05±0.05ms     1.42  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'float', 'skew')
+     2.13±0.01ms       3.02±0.1ms     1.42  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'float', 'kurt')
+     1.41±0.02ms      1.99±0.07ms     1.42  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'int', 'kurt')
+     2.35±0.02ms      3.34±0.04ms     1.42  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'float', 'kurt')
+     1.89±0.02ms      2.66±0.09ms     1.41  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'float', 'skew')
+     1.53±0.04ms      2.15±0.05ms     1.40  rolling.ExpandingMethods.time_expanding('Series', 'int', 'skew')
+     1.67±0.04ms      2.33±0.05ms     1.40  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'int', 'kurt')
+     2.00±0.01ms      2.80±0.02ms     1.40  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'kurt')
+     1.41±0.03ms      1.96±0.09ms     1.39  rolling.ExpandingMethods.time_expanding('Series', 'float', 'skew')
+     2.02±0.07ms      2.80±0.04ms     1.38  rolling.Methods.time_rolling('Series', 10, 'int', 'kurt')
+     1.81±0.01ms      2.49±0.02ms     1.37  rolling.Methods.time_rolling('Series', 10, 'float', 'skew')
+     2.11±0.06ms      2.90±0.05ms     1.37  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'skew')
+     2.10±0.06ms      2.87±0.05ms     1.37  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'skew')
+     2.29±0.04ms      3.12±0.03ms     1.36  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'skew')
+     2.28±0.05ms      3.10±0.01ms     1.36  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'kurt')
+     2.06±0.05ms      2.81±0.03ms     1.36  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'kurt')
+     2.12±0.01ms      2.88±0.04ms     1.36  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'float', 'kurt')
+     2.42±0.02ms      3.29±0.02ms     1.36  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'kurt')
+     2.07±0.05ms      2.78±0.02ms     1.34  rolling.Methods.time_rolling('Series', 1000, 'int', 'kurt')
+     2.27±0.02ms      3.04±0.02ms     1.34  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'skew')
+     1.57±0.01ms      2.11±0.03ms     1.34  rolling.ExpandingMethods.time_expanding('Series', 'int', 'kurt')
+     2.03±0.02ms      2.71±0.05ms     1.34  rolling.Methods.time_rolling('Series', 10, 'float', 'kurt')
+     2.39±0.06ms      3.18±0.07ms     1.34  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'kurt')
+     2.13±0.02ms       2.84±0.1ms     1.33  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'int', 'kurt')
+     2.16±0.03ms      2.87±0.05ms     1.33  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'skew')
+     1.74±0.02ms      2.30±0.01ms     1.33  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'skew')
+     1.92±0.03ms      2.54±0.02ms     1.33  rolling.Methods.time_rolling('Series', 1000, 'int', 'skew')
+     2.27±0.03ms      2.99±0.09ms     1.32  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'kurt')
+     2.23±0.03ms      2.94±0.06ms     1.32  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'int', 'kurt')
+     2.33±0.01ms      3.08±0.06ms     1.32  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'kurt')
+     1.88±0.03ms      2.47±0.03ms     1.31  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'skew')
+     2.13±0.06ms      2.79±0.04ms     1.31  rolling.ForwardWindowMethods.time_rolling('DataFrame', 10, 'float', 'kurt')
+     1.74±0.04ms      2.27±0.09ms     1.30  rolling.ForwardWindowMethods.time_rolling('DataFrame', 10, 'int', 'kurt')
+     2.30±0.02ms      2.98±0.02ms     1.30  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'skew')
+     2.45±0.02ms      3.15±0.04ms     1.28  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'kurt')
+     1.94±0.02ms      2.47±0.03ms     1.28  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'kurt')
+     1.54±0.01ms       1.91±0.2ms     1.24  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'lower')
+     4.03±0.03ms       5.00±0.4ms     1.24  rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'int', 'std')
+     1.49±0.01ms       1.84±0.2ms     1.23  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'higher')
+     1.53±0.01ms       1.82±0.2ms     1.19  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'nearest')
+     1.52±0.01ms       1.81±0.1ms     1.19  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'midpoint')
+     1.50±0.04ms      1.74±0.02ms     1.16  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'midpoint')
+     1.44±0.01ms      1.64±0.02ms     1.14  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'lower')
+     1.61±0.03ms      1.83±0.04ms     1.14  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'int', 'max')
+     2.93±0.01ms      3.33±0.04ms     1.14  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 1, 'higher')
+     1.82±0.02ms      2.06±0.01ms     1.13  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'nearest')
+     1.78±0.02ms      2.02±0.01ms     1.13  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'nearest')
+     1.45±0.04ms      1.64±0.02ms     1.13  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'linear')
+     1.60±0.03ms      1.81±0.01ms     1.13  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'int', 'min')
+     2.91±0.02ms       3.29±0.2ms     1.13  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 1, 'linear')
+     1.31±0.02ms      1.47±0.06ms     1.12  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'int', 'min')
+     1.82±0.01ms      2.04±0.02ms     1.12  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'linear')
+     1.46±0.02ms      1.64±0.02ms     1.12  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'nearest')
+     1.55±0.03ms      1.72±0.02ms     1.11  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'nearest')
+     1.52±0.02ms      1.69±0.01ms     1.11  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'higher')
+     1.52±0.04ms      1.69±0.03ms     1.11  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'linear')
+     1.84±0.02ms      2.04±0.02ms     1.11  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'lower')
+     1.83±0.02ms      2.03±0.02ms     1.11  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'higher')
+     1.51±0.03ms      1.67±0.03ms     1.11  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'higher')
+     2.67±0.03ms      2.95±0.06ms     1.11  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'float', 'max')
+     2.92±0.01ms      3.23±0.06ms     1.11  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 1, 'midpoint')
-     1.78±0.03ms      1.61±0.03ms     0.91  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'int', 'mean')
-     1.66±0.01ms      1.51±0.02ms     0.91  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'mean')
-       984±100μs          887±6μs     0.90  rolling.EWMMethods.time_ewm_times('Series', 10, 'float', 'mean')
-     1.63±0.02ms      1.44±0.02ms     0.88  rolling.Methods.time_rolling('Series', 10, 'int', 'mean')
-     1.56±0.02ms      1.36±0.01ms     0.87  rolling.Methods.time_rolling('Series', 1000, 'float', 'mean')
-     1.60±0.05ms      1.37±0.03ms     0.86  rolling.Methods.time_rolling('Series', 10, 'float', 'mean')
-      1.77±0.1ms      1.49±0.02ms     0.84  rolling.Methods.time_rolling('Series', 1000, 'int', 'mean')
-      1.04±0.1ms          871±3μs     0.84  rolling.EWMMethods.time_ewm_times('Series', 10, 'float', 'std')
-      1.18±0.1ms         956±10μs     0.81  rolling.EWMMethods.time_ewm_times('Series', 10, 'int', 'std')

Would have expected a bigger hit.


with nogil:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can leave the original nogil right? e.g. no reason to do it twice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subtraction of the mean does not wirk with nogil, that is the reason did it there again

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm why? this is array - scalar right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought so too, but got an error. Will try again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, getting the following error without gil:

Error compiling Cython file:
------------------------------------------------------------
...
                nobs_mean += 1
                sum_val += val
        mean_val = sum_val / nobs_mean
        # Other cases would lead to imprecision for smallest values
        if min_val - mean_val > -1e5:
            values = values - round(mean_val)
                                   ^
------------------------------------------------------------

pandas/_libs/window/aggregations.pyx:545:36: Converting to Python object not allowed without gil
Traceback (most recent call last):
  File "setup.py", line 790, in <module>
    setup_package()
  File "setup.py", line 760, in setup_package
    ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
  File "setup.py", line 539, in maybe_cythonize
    return cythonize(extensions, *args, **kwargs)
  File "/home/developer/anaconda3/envs/omi_reports/pandas-dev/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1102, in cythonize
    cythonize_one(*args)
  File "/home/developer/anaconda3/envs/omi_reports/pandas-dev/lib/python3.8/site-packages/Cython/Build/Dependencies.py", line 1225, in cythonize_one
    raise CompileError(None, pyx_file)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm no, mean_value is a float. This does not work without round either. Maybe the broadcasting is the problem here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah maybe have to use np.broadcast_to

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a solution for the round issue. Used a loop for the subtraction, because np.broadcast_to did not work either. No matter this, I think the loop is the most intuitive solution

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from libc.math cimport round?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works, thanks very much.

if notnan(val):
nobs_mean += 1
sum_val += val
mean_val = sum_val / nobs_mean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm, why are you adding this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issues mentioned above describes a use case. where the series is shifted by 5000. In this case, if we do not subtract the mean, The decimal places will be so small, that they will be ignored. Leading to the numerical imprecision described in the issue.

Kahan summation only solves issues, when the range of the values between the windows is really big. Could post a more precise and explicit example, if this helps

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i understand the issue, but am unclear why you are not always just subtracting the mean rather than conditionally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah sorry. Misunderstood you.

In case of something like [1e12, 0.023, 0.04565, 0.343545, 0.343434] and subtract the mean from the complete series, we lose precision for the second window starting at 0.023, because this abs value gets really big too

@jreback jreback added this to the 1.2 milestone Nov 9, 2020
@jreback
Copy link
Contributor

jreback commented Nov 9, 2020

looks good, can you merge master and ping on green (also maybe want to re-run the perf tests to give a good read)

@phofl
Copy link
Member Author

phofl commented Nov 9, 2020

asvs:

       before           after         ratio
     [82cd86c1]       [50ea50a7]
     <25165^2>        <6929>    
+      2.39±0.1ms         4.20±1ms     1.76  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'int', 'kurt')
+     2.16±0.09ms       3.76±0.8ms     1.74  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'float', 'skew')
+      2.22±0.1ms       3.77±0.9ms     1.70  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'int', 'skew')
+     2.36±0.09ms       4.00±0.6ms     1.69  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'float', 'kurt')
+     2.22±0.04ms       3.76±0.9ms     1.69  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'skew')
+     1.23±0.02ms      2.06±0.03ms     1.67  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'int', 'kurt')
+     2.41±0.05ms       4.03±0.7ms     1.67  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'kurt')
+      1.37±0.2ms       2.29±0.2ms     1.67  rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'int', 'kurt')
+     2.23±0.07ms       3.66±0.6ms     1.64  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'kurt')
+     2.39±0.02ms       3.88±0.6ms     1.62  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'kurt')
+     2.20±0.01ms       3.56±0.2ms     1.62  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'kurt')
+     1.91±0.02ms      3.04±0.04ms     1.59  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'float', 'kurt')
+     1.75±0.03ms      2.77±0.01ms     1.58  rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'float', 'kurt')
+     1.86±0.03ms      2.94±0.03ms     1.58  rolling.Methods.time_rolling('Series', 1000, 'float', 'kurt')
+     1.55±0.04ms      2.43±0.03ms     1.57  rolling.ForwardWindowMethods.time_rolling('DataFrame', 10, 'int', 'kurt')
+     2.18±0.02ms       3.42±0.5ms     1.57  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'skew')
+     2.00±0.02ms      3.12±0.03ms     1.56  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'int', 'kurt')
+     1.88±0.02ms      2.92±0.03ms     1.56  rolling.Methods.time_rolling('Series', 10, 'float', 'kurt')
+     1.54±0.02ms      2.40±0.02ms     1.56  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'int', 'kurt')
+     1.96±0.01ms      3.04±0.08ms     1.55  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'kurt')
+     2.11±0.03ms      3.25±0.04ms     1.54  rolling.ForwardWindowMethods.time_rolling('DataFrame', 10, 'float', 'kurt')
+      1.82±0.2ms      2.79±0.03ms     1.53  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'float', 'kurt')
+        2.00±0ms      3.06±0.06ms     1.53  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'int', 'kurt')
+     1.42±0.03ms      2.16±0.02ms     1.52  rolling.ExpandingMethods.time_expanding('Series', 'int', 'kurt')
+     1.91±0.02ms      2.90±0.02ms     1.52  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'kurt')
+     1.99±0.01ms      3.01±0.02ms     1.52  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'kurt')
+     2.07±0.03ms      3.12±0.04ms     1.51  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'float', 'kurt')
+     1.71±0.02ms      2.57±0.09ms     1.50  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'float', 'skew')
+     1.98±0.03ms      2.97±0.03ms     1.50  rolling.Methods.time_rolling('Series', 1000, 'int', 'kurt')
+     2.27±0.03ms      3.38±0.03ms     1.49  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'kurt')
+     1.65±0.02ms      2.45±0.02ms     1.48  rolling.Methods.time_rolling('Series', 1000, 'float', 'skew')
+     1.71±0.02ms      2.51±0.02ms     1.47  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'float', 'skew')
+     2.29±0.03ms      3.36±0.06ms     1.47  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'kurt')
+     1.40±0.02ms      2.05±0.03ms     1.47  rolling.ExpandingMethods.time_expanding('Series', 'float', 'kurt')
+     2.42±0.03ms      3.53±0.05ms     1.46  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'int', 'kurt')
+     2.19±0.01ms      3.19±0.04ms     1.46  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'kurt')
+        1.80±0ms      2.60±0.07ms     1.44  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'kurt')
+     1.71±0.03ms      2.47±0.04ms     1.44  rolling.Methods.time_rolling('Series', 10, 'int', 'skew')
+     1.97±0.02ms      2.84±0.04ms     1.44  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'skew')
+     1.73±0.02ms      2.48±0.02ms     1.43  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'skew')
+     1.81±0.01ms      2.59±0.03ms     1.43  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'int', 'skew')
+     1.65±0.03ms      2.35±0.03ms     1.43  rolling.Methods.time_rolling('Series', 10, 'float', 'skew')
+     1.81±0.03ms      2.56±0.05ms     1.42  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'int', 'skew')
+     2.56±0.02ms         3.63±1ms     1.41  rolling.Quantile.time_quantile('Series', 1000, 'float', 1, 'lower')
+     2.04±0.01ms      2.88±0.02ms     1.41  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'skew')
+     1.72±0.01ms      2.42±0.04ms     1.41  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'kurt')
+     2.11±0.03ms      2.96±0.05ms     1.40  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'skew')
+     1.75±0.01ms      2.45±0.05ms     1.40  rolling.Methods.time_rolling('Series', 1000, 'int', 'skew')
+     1.35±0.01ms      1.87±0.03ms     1.39  rolling.ExpandingMethods.time_expanding('Series', 'int', 'skew')
+     1.84±0.03ms      2.53±0.03ms     1.38  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'int', 'skew')
+     2.61±0.03ms       3.58±0.9ms     1.37  rolling.Quantile.time_quantile('Series', 1000, 'float', 1, 'higher')
+     2.00±0.03ms      2.73±0.06ms     1.37  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'skew')
+     2.04±0.06ms       2.76±0.1ms     1.35  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'skew')
+     2.24±0.02ms      3.02±0.05ms     1.35  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'int', 'skew')
+     1.32±0.02ms      1.77±0.01ms     1.35  rolling.ExpandingMethods.time_expanding('Series', 'float', 'skew')
+     1.61±0.01ms      2.17±0.02ms     1.35  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'skew')
+     1.52±0.02ms       2.05±0.3ms     1.35  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'higher')
+     1.74±0.02ms      2.29±0.04ms     1.32  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'skew')
+     3.13±0.05ms       4.06±0.8ms     1.30  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'max')
+     2.62±0.04ms       3.27±0.3ms     1.25  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'max')
+     1.53±0.02ms      1.90±0.08ms     1.24  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'midpoint')
+     1.53±0.02ms      1.84±0.06ms     1.21  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'higher')
+     1.54±0.02ms      1.81±0.03ms     1.18  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'linear')
+     3.06±0.03ms       3.58±0.4ms     1.17  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'max')
+     1.48±0.02ms      1.72±0.01ms     1.16  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'lower')
+     1.33±0.01ms      1.54±0.01ms     1.16  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'int', 'max')
+     1.48±0.02ms         1.71±0ms     1.16  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'higher')
+     2.48±0.03ms      2.87±0.07ms     1.16  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'max')
+     1.84±0.02ms      2.12±0.02ms     1.15  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'midpoint')
+     1.55±0.02ms      1.78±0.02ms     1.15  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'lower')
+     1.50±0.02ms      1.72±0.04ms     1.14  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'linear')
+     1.85±0.02ms      2.10±0.04ms     1.14  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'nearest')
+     1.50±0.03ms      1.71±0.04ms     1.14  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'nearest')
+     1.86±0.05ms      2.11±0.02ms     1.14  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'higher')
+     1.55±0.02ms      1.76±0.01ms     1.13  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'nearest')
+     1.52±0.02ms      1.73±0.01ms     1.13  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'midpoint')
+     1.56±0.02ms      1.76±0.02ms     1.13  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'nearest')
+     1.34±0.02ms      1.52±0.02ms     1.13  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'int', 'min')
+     1.83±0.02ms      2.06±0.04ms     1.13  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'nearest')
+     1.83±0.01ms      2.07±0.02ms     1.13  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'linear')
+     1.85±0.03ms      2.08±0.05ms     1.13  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'lower')
+     1.86±0.02ms      2.10±0.02ms     1.13  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 1, 'nearest')
+     1.90±0.02ms      2.14±0.07ms     1.13  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0, 'midpoint')
+     1.83±0.01ms      2.06±0.04ms     1.13  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0, 'linear')
+     2.67±0.02ms       2.98±0.1ms     1.12  rolling.VariableWindowMethods.time_rolling('Series', '1d', 'float', 'min')
+     1.87±0.06ms      2.09±0.03ms     1.12  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 1, 'linear')
+     1.87±0.01ms      2.07±0.02ms     1.11  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'linear')
+     2.12±0.02ms      2.36±0.01ms     1.11  rolling.ExpandingMethods.time_expanding('Series', 'float', 'max')
+     2.43±0.07ms      2.70±0.01ms     1.11  rolling.ForwardWindowMethods.time_rolling('Series', 10, 'float', 'max')
+     1.74±0.02ms      1.93±0.01ms     1.11  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'int', 'max')
+     1.83±0.02ms      2.03±0.03ms     1.11  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'lower')
+     1.57±0.03ms      1.74±0.01ms     1.11  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'linear')
+     1.86±0.03ms      2.05±0.02ms     1.10  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0, 'nearest')
+     2.62±0.02ms      2.89±0.07ms     1.10  rolling.Methods.time_rolling('Series', 10, 'float', 'max')
+     2.75±0.03ms      3.03±0.01ms     1.10  rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'float', 'max')

@phofl
Copy link
Member Author

phofl commented Nov 9, 2020

cc @jreback pipelines are green

@jreback jreback merged commit 8dd2d95 into pandas-dev:master Nov 9, 2020
@jreback
Copy link
Contributor

jreback commented Nov 9, 2020

thanks @phofl keep em coming!

@kaixiongg
Copy link

kaixiongg commented Sep 3, 2024

Why not consider implementing it using the Welford method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Poor numerical stability of rolling_kurt and rolling_skew
5 participants