Skip to content

ENH: Implement weighted rolling var and std #27682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Nov 8, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
04e1f70
ENH: Implement weighted rolling var and std
ihsansecer Jul 31, 2019
f62b06e
Add whatsnew
ihsansecer Jul 31, 2019
ea0b2db
Remove whitespace
ihsansecer Jul 31, 2019
338754e
Pass updated kwargs to scipy's get_window
ihsansecer Aug 1, 2019
ff1aa8f
Make type changes
ihsansecer Aug 1, 2019
e58548e
Add test for var
ihsansecer Aug 1, 2019
1273cf9
Add docs
ihsansecer Aug 1, 2019
f13f0aa
Add more typing
ihsansecer Aug 1, 2019
d25561c
Update docs
ihsansecer Aug 2, 2019
2a4944a
Merge remote-tracking branch 'upstream/master' into weighted-roll-var
ihsansecer Aug 2, 2019
7840b29
Add docs
ihsansecer Aug 2, 2019
6922e4d
Simplify extracting window args
ihsansecer Aug 7, 2019
7510daf
Merge remote-tracking branch 'upstream/master' into weighted-roll-var
ihsansecer Aug 7, 2019
8347185
Merge remote-tracking branch 'upstream/master' into weighted-roll-var
ihsansecer Aug 8, 2019
8c6d4ae
Merge remote-tracking branch 'upstream/master' into weighted-roll-var
ihsansecer Aug 9, 2019
4b3e5eb
Merge remote-tracking branch 'upstream/master' into weighted-roll-var
ihsansecer Aug 14, 2019
1221ccd
Merge branch 'master' into weighted-roll-var
ihsansecer Aug 16, 2019
b260ff7
Merge remote-tracking branch 'upstream/master' into weighted-roll-var
ihsansecer Sep 8, 2019
6f0f1a5
Add version to docstring, add reference
ihsansecer Sep 9, 2019
b54dea0
Fix multiline issue
ihsansecer Sep 10, 2019
4bb954f
Fix multiline issue
ihsansecer Sep 10, 2019
565097e
Merge remote-tracking branch 'upstream/master' into weighted-roll-var
ihsansecer Oct 7, 2019
3fa6028
Remove incoming change line
ihsansecer Oct 8, 2019
221a522
Parameterize window test
ihsansecer Oct 8, 2019
b40dc0a
Remove blank line
ihsansecer Oct 8, 2019
c06f2b6
Merge branch 'master' into weighted-roll-var
ihsansecer Nov 8, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/reference/window.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ Standard moving window functions
Rolling.quantile
Window.mean
Window.sum
Window.var
Window.std

.. _api.functions_expanding:

Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ Other enhancements
(depending on the presence of missing data) or object dtype column. (:issue:`28368`)
- :meth:`DataFrame.to_json` now accepts an ``indent`` integer argument to enable pretty printing of JSON output (:issue:`12004`)
- :meth:`read_stata` can read Stata 119 dta files. (:issue:`28250`)
- Implemented :meth:`pandas.core.window.Window.var` and :meth:`pandas.core.window.Window.std` functions (:issue:`26597`)
- Added ``encoding`` argument to :meth:`DataFrame.to_string` for non-ascii text (:issue:`28766`)
- Added ``encoding`` argument to :func:`DataFrame.to_html` for non-ascii text (:issue:`28663`)
- :meth:`Styler.background_gradient` now accepts ``vmin`` and ``vmax`` arguments (:issue:`12145`)
Expand Down
220 changes: 220 additions & 0 deletions pandas/_libs/window.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1752,6 +1752,226 @@ cdef ndarray[float64_t] _roll_weighted_sum_mean(float64_t[:] values,
return np.asarray(output)


# ----------------------------------------------------------------------
# Rolling var for weighted window


cdef inline float64_t calc_weighted_var(float64_t t,
float64_t sum_w,
Py_ssize_t win_n,
unsigned int ddof,
float64_t nobs,
int64_t minp) nogil:
"""
Calculate weighted variance for a window using West's method.

Paper: https://dl.acm.org/citation.cfm?id=359153

Parameters
----------
t: float64_t
sum of weighted squared differences
sum_w: float64_t
sum of weights
win_n: Py_ssize_t
window size
ddof: unsigned int
delta degrees of freedom
nobs: float64_t
number of observations
minp: int64_t
minimum number of observations

Returns
-------
result : float64_t
weighted variance of the window
"""

cdef:
float64_t result

# Variance is unchanged if no observation is added or removed
if (nobs >= minp) and (nobs > ddof):

# pathological case
if nobs == 1:
result = 0
else:
result = t * win_n / ((win_n - ddof) * sum_w)
if result < 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this only be negative if the weight was negative which we should validate much earlier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it might be if one weight is negative and another is positive but not necessarily

result = 0
else:
result = NaN

return result


cdef inline void add_weighted_var(float64_t val,
float64_t w,
float64_t *t,
float64_t *sum_w,
float64_t *mean,
float64_t *nobs) nogil:
"""
Update weighted mean, sum of weights and sum of weighted squared
differences to include value and weight pair in weighted variance
calculation using West's method.

Paper: https://dl.acm.org/citation.cfm?id=359153

Parameters
----------
val: float64_t
window values
w: float64_t
window weights
t: float64_t
sum of weighted squared differences
sum_w: float64_t
sum of weights
mean: float64_t
weighted mean
nobs: float64_t
number of observations
"""

cdef:
float64_t temp, q, r

if isnan(val):
return

nobs[0] = nobs[0] + 1

q = val - mean[0]
temp = sum_w[0] + w
r = q * w / temp

mean[0] = mean[0] + r
t[0] = t[0] + r * sum_w[0] * q
sum_w[0] = temp


cdef inline void remove_weighted_var(float64_t val,
float64_t w,
float64_t *t,
float64_t *sum_w,
float64_t *mean,
float64_t *nobs) nogil:
"""
Update weighted mean, sum of weights and sum of weighted squared
differences to remove value and weight pair from weighted variance
calculation using West's method.

Paper: https://dl.acm.org/citation.cfm?id=359153

Parameters
----------
val: float64_t
window values
w: float64_t
window weights
t: float64_t
sum of weighted squared differences
sum_w: float64_t
sum of weights
mean: float64_t
weighted mean
nobs: float64_t
number of observations
"""

cdef:
float64_t temp, q, r

if notnan(val):
nobs[0] = nobs[0] - 1

if nobs[0]:
q = val - mean[0]
temp = sum_w[0] - w
r = q * w / temp

mean[0] = mean[0] - r
t[0] = t[0] - r * sum_w[0] * q
sum_w[0] = temp

else:
t[0] = 0
sum_w[0] = 0
mean[0] = 0


def roll_weighted_var(float64_t[:] values, float64_t[:] weights,
int64_t minp, unsigned int ddof):
"""
Calculates weighted rolling variance using West's online algorithm.

Paper: https://dl.acm.org/citation.cfm?id=359153

Parameters
----------
values: float64_t[:]
values to roll window over
weights: float64_t[:]
array of weights whose lenght is window size
minp: int64_t
minimum number of observations to calculate
variance of a window
ddof: unsigned int
the divisor used in variance calculations
is the window size - ddof

Returns
-------
output: float64_t[:]
weighted variances of windows
"""

cdef:
float64_t t = 0, sum_w = 0, mean = 0, nobs = 0
float64_t val, pre_val, w, pre_w
Py_ssize_t i, n, win_n
float64_t[:] output

n = len(values)
win_n = len(weights)
output = np.empty(n, dtype=float)

with nogil:

for i in range(win_n):
add_weighted_var(values[i], weights[i], &t,
&sum_w, &mean, &nobs)

output[i] = calc_weighted_var(t, sum_w, win_n,
ddof, nobs, minp)

for i in range(win_n, n):
val = values[i]
pre_val = values[i - win_n]

w = weights[i % win_n]
pre_w = weights[(i - win_n) % win_n]

if notnan(val):
if pre_val == pre_val:
remove_weighted_var(pre_val, pre_w, &t,
&sum_w, &mean, &nobs)

add_weighted_var(val, w, &t, &sum_w, &mean, &nobs)

elif pre_val == pre_val:
remove_weighted_var(pre_val, pre_w, &t,
&sum_w, &mean, &nobs)

output[i] = calc_weighted_var(t, sum_w, win_n,
ddof, nobs, minp)

return output


# ----------------------------------------------------------------------
# Exponentially weighted moving average

Expand Down
4 changes: 2 additions & 2 deletions pandas/core/window/expanding.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,13 +181,13 @@ def mean(self, *args, **kwargs):
def median(self, **kwargs):
return super().median(**kwargs)

@Substitution(name="expanding")
@Substitution(name="expanding", versionadded="")
@Appender(_shared_docs["std"])
def std(self, ddof=1, *args, **kwargs):
nv.validate_expanding_func("std", args, kwargs)
return super().std(ddof=ddof, **kwargs)

@Substitution(name="expanding")
@Substitution(name="expanding", versionadded="")
@Appender(_shared_docs["var"])
def var(self, ddof=1, *args, **kwargs):
nv.validate_expanding_func("var", args, kwargs)
Expand Down
Loading