Skip to content

DOC: make difference between numpy behaviour clearer in Dataframe.std() and Series.std() #35985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
timhunderwood opened this issue Aug 30, 2020 · 1 comment · Fixed by #35986
Labels

Comments

@timhunderwood
Copy link
Contributor

timhunderwood commented Aug 30, 2020

Location of the documentation

https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.DataFrame.std.html
https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.std.html

Documentation problem

The ddof kwarg has a different default to numpy. This means using the std() method on a Series or numpy.array with the same values give different results:

>>>df=pandas.Series([1,2,3])
>>>df.std()
Out[4]: 1.0
>>>df.values.std()
Out[5]: 0.816496580927726

Suggested fix for documentation

I assume the difference in behaviour is intentional, but I would suggest making this difference clearer in the documentation.

We could add the text: "Note that this normalization is different to numpy, which by default normalizes by N (equivalent to ddof=0)."
A similar string could be added in the kwarg description for ddof.

This would make it clear from reading the pandas docs that the behaviour is different to numpy, rather than having to compare the numpy docs side by side after noticing a discrepancy. It would also make users more likely to consider which ddof value they want for their use case.

If you agree, I am happy to make the PR to change this.

@timhunderwood timhunderwood added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 30, 2020
@MarcoGorelli
Copy link
Member

Sure, go ahead (perhaps this would go well in a Notes section)

@MarcoGorelli MarcoGorelli removed the Needs Triage Issue that has not been reviewed by a pandas team member label Aug 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants