Skip to content

DOC: sample variance -> population variance #46482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
kwhkim opened this issue Mar 23, 2022 · 5 comments · Fixed by #46711
Closed
1 task done

DOC: sample variance -> population variance #46482

kwhkim opened this issue Mar 23, 2022 · 5 comments · Fixed by #46711

Comments

@kwhkim
Copy link
Contributor

kwhkim commented Mar 23, 2022

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/dev/user_guide/gotchas.html

Documentation problem

Differences with NumPy
For Series and DataFrame objects, var() normalizes by
N-1 to produce unbiased estimates of the sample variance, while NumPy’s numpy.var() normalizes by N, which measures the variance of the sample.

"unbiased estimates of the sample variance" needs to be corrected to 'the population variance'

Suggested fix for documentation

"unbiased estimates of the sample variance" needs to be corrected to 'the population variance'

@kwhkim kwhkim added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 23, 2022
@phofl
Copy link
Member

phofl commented Apr 8, 2022

Could you elaborate why you would make this change?

@kwhkim
Copy link
Contributor Author

kwhkim commented Apr 8, 2022

https://en.wikipedia.org/wiki/Bias_of_an_estimator

note that sample variance can be either biased or unbiased.

Note that the usual definition of sample variance is

and this is an unbiased estimator of the population variance.

The phrase above is quoted as it is in wikipedia. $S^2$ is a random variable so it is called estimator. If we have concrete numbers and calculated using the same formula, the result should be called estimate.

@phofl
Copy link
Member

phofl commented Apr 8, 2022

Thanks, would you be interested in submitting a pr?

@phofl phofl added good first issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 8, 2022
@kwhkim
Copy link
Contributor Author

kwhkim commented Apr 9, 2022

sure

@kwhkim
Copy link
Contributor Author

kwhkim commented Apr 26, 2022

@jreback sorry this pull request is for issue #46440. Would you help me move this PR to that issue? I thought trailing (#...) would be sufficient but it appears not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants