DOC: sample variance -> population variance #46482

kwhkim · 2022-03-23T02:33:42Z

Pandas version checks

I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/dev/user_guide/gotchas.html

Documentation problem

Differences with NumPy
For Series and DataFrame objects, var() normalizes by
N-1 to produce unbiased estimates of the sample variance, while NumPy’s numpy.var() normalizes by N, which measures the variance of the sample.

"unbiased estimates of the sample variance" needs to be corrected to 'the population variance'

Suggested fix for documentation

"unbiased estimates of the sample variance" needs to be corrected to 'the population variance'

The text was updated successfully, but these errors were encountered:

phofl · 2022-04-08T11:26:54Z

Could you elaborate why you would make this change?

kwhkim · 2022-04-08T12:50:01Z

https://en.wikipedia.org/wiki/Bias_of_an_estimator

note that sample variance can be either biased or unbiased.

Note that the usual definition of sample variance is

$S^{2}={\frac {1}{n-1}}\sum _{i=1}^{n}(X_{i}-{\overline {X}}\,)^{2}$

and this is an unbiased estimator of the population variance.

The phrase above is quoted as it is in wikipedia. $S^2$ is a random variable so it is called estimator. If we have concrete numbers and calculated using the same formula, the result should be called estimate.

phofl · 2022-04-08T12:56:04Z

Thanks, would you be interested in submitting a pr?

kwhkim · 2022-04-09T00:10:23Z

sure

kwhkim · 2022-04-26T02:23:55Z

@jreback sorry this pull request is for issue #46440. Would you help me move this PR to that issue? I thought trailing (#...) would be sufficient but it appears not.

…s-dev#46711)

kwhkim added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 23, 2022

phofl added good first issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 8, 2022

kwhkim added a commit to kwhkim/pandas that referenced this issue Apr 9, 2022

DOC: sample variance -> population variance (pandas-dev#46482)

2981bc5

kwhkim mentioned this issue Apr 9, 2022

DOC: sample variance -> population variance (#46482) #46711

Merged

4 tasks

jreback added this to the 1.5 milestone Apr 10, 2022

jreback closed this as completed in #46711 Apr 27, 2022

jreback pushed a commit that referenced this issue Apr 27, 2022

DOC: sample variance -> population variance (#46482) (#46711)

f706fc9

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this issue Jul 13, 2022

DOC: sample variance -> population variance (pandas-dev#46482) (panda…

ad41e6c

…s-dev#46711)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: sample variance -> population variance #46482

DOC: sample variance -> population variance #46482

kwhkim commented Mar 23, 2022

phofl commented Apr 8, 2022

kwhkim commented Apr 8, 2022 •

edited

Loading

phofl commented Apr 8, 2022

kwhkim commented Apr 9, 2022

kwhkim commented Apr 26, 2022

DOC: sample variance -> population variance #46482

DOC: sample variance -> population variance #46482

Comments

kwhkim commented Mar 23, 2022

Pandas version checks

Location of the documentation

Documentation problem

Suggested fix for documentation

phofl commented Apr 8, 2022

kwhkim commented Apr 8, 2022 • edited Loading

phofl commented Apr 8, 2022

kwhkim commented Apr 9, 2022

kwhkim commented Apr 26, 2022

kwhkim commented Apr 8, 2022 •

edited

Loading