Skip to content

BUG: ewmvar/std/cov() de-biasing formula incorrect #7912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seth-p opened this issue Aug 4, 2014 · 4 comments · Fixed by #7926
Closed

BUG: ewmvar/std/cov() de-biasing formula incorrect #7912

seth-p opened this issue Aug 4, 2014 · 4 comments · Fixed by #7926
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@seth-p
Copy link
Contributor

seth-p commented Aug 4, 2014

ewmvar() and ewmcov() contain the following code:

    if not bias:
        result *= (1.0 + 2.0 * com) / (2.0 * com)

Where does this scaling formula come from? Does the number of observations not enter in to the bias of the (sample) variance estimation? I would imagine that perhaps the formula above is for the case that have an infinite sample, but for a finite sample I would have thought that the number of observations would enter into things.

@seth-p
Copy link
Contributor Author

seth-p commented Aug 4, 2014

Indeed, it looks like the existing scaling factor (1.0 + 2.0 * com) / (2.0 * com) is the special case of the "V1^2 / (V1^2 - V2)" formula of http://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_variance -- or equivalently, the formula under gsl_stats_wvariance at http://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html -- when have an infinite number of data points.

I think the code should be updated to calculate the correct formula for a finite sample. Though I think doing so may require writing a separate algos.ewmvar() analogous to the existing algos.roll_var() (as opposed to simply calculating ewmvar(x) = ewma(x*x) - ewma(x)**2).

@jreback jreback added this to the 0.15.1 milestone Aug 4, 2014
@jreback
Copy link
Contributor

jreback commented Aug 4, 2014

@seth-p no idea why/when this came to be. If you can figure that out you are 90% there.

@seth-p
Copy link
Contributor Author

seth-p commented Aug 4, 2014

Yep, numerical testing in Excel shows that the "V1^2 / (V1^2 - V2)" formula approaches (1.0 + 2.0 * com) / (2.0 * com) as the number of samples increases -- assuming adjust=True and there are no missing values.

I think I can implement the "V1^2 / (V1^2 - V2)" bias correction formula (which will depend on adjust, ignore_na, and the pattern of missing values) without resorting to writing a new algos.ewmvar(). So shouldn't be complicated.

@seth-p
Copy link
Contributor Author

seth-p commented Aug 4, 2014

CC'ing @jaimefrio, in case of interest.

@jreback jreback modified the milestones: 0.15.1, 0.15.0 Aug 5, 2014
@seth-p seth-p changed the title BUG: de-biasing formula in ewmvar() and ewmcov()? BUG: ewmvar/std/cov() de-biasing formula incorrect Aug 21, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
2 participants