Skip to content

ENH: Implement weighted rolling var and std #27682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Nov 8, 2019
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
04e1f70
ENH: Implement weighted rolling var and std
ihsansecer Jul 31, 2019
f62b06e
Add whatsnew
ihsansecer Jul 31, 2019
ea0b2db
Remove whitespace
ihsansecer Jul 31, 2019
338754e
Pass updated kwargs to scipy's get_window
ihsansecer Aug 1, 2019
ff1aa8f
Make type changes
ihsansecer Aug 1, 2019
e58548e
Add test for var
ihsansecer Aug 1, 2019
1273cf9
Add docs
ihsansecer Aug 1, 2019
f13f0aa
Add more typing
ihsansecer Aug 1, 2019
d25561c
Update docs
ihsansecer Aug 2, 2019
2a4944a
Merge remote-tracking branch 'upstream/master' into weighted-roll-var
ihsansecer Aug 2, 2019
7840b29
Add docs
ihsansecer Aug 2, 2019
6922e4d
Simplify extracting window args
ihsansecer Aug 7, 2019
7510daf
Merge remote-tracking branch 'upstream/master' into weighted-roll-var
ihsansecer Aug 7, 2019
8347185
Merge remote-tracking branch 'upstream/master' into weighted-roll-var
ihsansecer Aug 8, 2019
8c6d4ae
Merge remote-tracking branch 'upstream/master' into weighted-roll-var
ihsansecer Aug 9, 2019
4b3e5eb
Merge remote-tracking branch 'upstream/master' into weighted-roll-var
ihsansecer Aug 14, 2019
1221ccd
Merge branch 'master' into weighted-roll-var
ihsansecer Aug 16, 2019
b260ff7
Merge remote-tracking branch 'upstream/master' into weighted-roll-var
ihsansecer Sep 8, 2019
6f0f1a5
Add version to docstring, add reference
ihsansecer Sep 9, 2019
b54dea0
Fix multiline issue
ihsansecer Sep 10, 2019
4bb954f
Fix multiline issue
ihsansecer Sep 10, 2019
565097e
Merge remote-tracking branch 'upstream/master' into weighted-roll-var
ihsansecer Oct 7, 2019
3fa6028
Remove incoming change line
ihsansecer Oct 8, 2019
221a522
Parameterize window test
ihsansecer Oct 8, 2019
b40dc0a
Remove blank line
ihsansecer Oct 8, 2019
c06f2b6
Merge branch 'master' into weighted-roll-var
ihsansecer Nov 8, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.25.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Enhancements
Other enhancements
^^^^^^^^^^^^^^^^^^

-
- Implemented :meth:`pandas.core.window.Window.var` and :meth:`pandas.core.window.Window.std` functions (:issue:`26597`)
-
-

Expand Down
146 changes: 146 additions & 0 deletions pandas/_libs/window.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1754,6 +1754,152 @@ def _roll_weighted_sum_mean(float64_t[:] values, float64_t[:] weights,
return output


# ----------------------------------------------------------------------
# Rolling var for weighted window


cdef inline float64_t calc_weighted_var(float64_t t,
float64_t sum_w,
Py_ssize_t win_n,
unsigned int ddof,
float64_t nobs,
int64_t minp) nogil:
cdef:
float64_t result

# Variance is unchanged if no observation is added or removed
if (nobs >= minp) and (nobs > ddof):

# pathological case
if nobs == 1:
result = 0
else:
result = t * win_n / ((win_n - ddof) * sum_w)
if result < 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this only be negative if the weight was negative which we should validate much earlier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it might be if one weight is negative and another is positive but not necessarily

result = 0
else:
result = NaN

return result


cdef inline void add_weighted_var(float64_t val,
float64_t w,
float64_t *t,
float64_t *sum_w,
float64_t *mean,
float64_t *nobs) nogil:
"""
Update weighted mean (mean), sum of weights (sum_w) and sum of
weighted squared differences (t) to include value (val) and
weight (w) pair in variance calculation using West's method.

Paper: https://dl.acm.org/citation.cfm?id=359153
"""

cdef:
float64_t temp, q, r

if isnan(val):
return

nobs[0] = nobs[0] + 1

q = val - mean[0]
temp = sum_w[0] + w
r = q * w / temp

mean[0] = mean[0] + r
t[0] = t[0] + r * sum_w[0] * q
sum_w[0] = temp


cdef inline void remove_weighted_var(float64_t val,
float64_t w,
float64_t *t,
float64_t *sum_w,
float64_t *mean,
float64_t *nobs) nogil:
"""
Update weighted mean (mean), sum of weights (sum_w) and sum
of weighted squared differences (t) to remove value (val) and
weight (w) pair from variance calculation using West's method.

Paper: https://dl.acm.org/citation.cfm?id=359153
"""

cdef:
float64_t temp, q, r

if notnan(val):
nobs[0] = nobs[0] - 1

if nobs[0]:
q = val - mean[0]
temp = sum_w[0] - w
r = q * w / temp

mean[0] = mean[0] - r
t[0] = t[0] - r * sum_w[0] * q
sum_w[0] = temp

else:
t[0] = 0
sum_w[0] = 0
mean[0] = 0


def roll_weighted_var(float64_t[:] values, float64_t[:] weights,
int64_t minp, unsigned int ddof):
"""
Calculates weighted rolling variance using West's online algorithm.

Paper: https://dl.acm.org/citation.cfm?id=359153
"""

cdef:
float64_t t = 0, sum_w = 0, mean = 0, nobs = 0
float64_t val, pre_val, w, pre_w
Py_ssize_t i, n, win_n
float64_t[:] output

n = len(values)
win_n = len(weights)
output = np.empty(n, dtype=float)

with nogil:

for i in range(win_n):
add_weighted_var(values[i], weights[i], &t,
&sum_w, &mean, &nobs)

output[i] = calc_weighted_var(t, sum_w, win_n,
ddof, nobs, minp)

for i in range(win_n, n):
val = values[i]
pre_val = values[i - win_n]

w = weights[i % win_n]
pre_w = weights[(i - win_n) % win_n]

if notnan(val):
if pre_val == pre_val:
remove_weighted_var(pre_val, pre_w, &t,
&sum_w, &mean, &nobs)

add_weighted_var(val, w, &t, &sum_w, &mean, &nobs)

elif pre_val == pre_val:
remove_weighted_var(pre_val, pre_w, &t,
&sum_w, &mean, &nobs)

output[i] = calc_weighted_var(t, sum_w, win_n,
ddof, nobs, minp)

return output


# ----------------------------------------------------------------------
# Exponentially weighted moving average

Expand Down
Loading