DOC: make difference between numpy behaviour clearer in Dataframe.std() and Series.std() #35985

timhunderwood · 2020-08-30T08:01:06Z

Location of the documentation

https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.DataFrame.std.html
https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.std.html

Documentation problem

The ddof kwarg has a different default to numpy. This means using the std() method on a Series or numpy.array with the same values give different results:

>>>df=pandas.Series([1,2,3])
>>>df.std()
Out[4]: 1.0
>>>df.values.std()
Out[5]: 0.816496580927726

Suggested fix for documentation

I assume the difference in behaviour is intentional, but I would suggest making this difference clearer in the documentation.

We could add the text: "Note that this normalization is different to numpy, which by default normalizes by N (equivalent to ddof=0)."
A similar string could be added in the kwarg description for ddof.

This would make it clear from reading the pandas docs that the behaviour is different to numpy, rather than having to compare the numpy docs side by side after noticing a discrepancy. It would also make users more likely to consider which ddof value they want for their use case.

If you agree, I am happy to make the PR to change this.

The text was updated successfully, but these errors were encountered:

MarcoGorelli · 2020-08-30T09:24:29Z

Sure, go ahead (perhaps this would go well in a Notes section)

timhunderwood added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 30, 2020

MarcoGorelli removed the Needs Triage Issue that has not been reviewed by a pandas team member label Aug 30, 2020

timhunderwood mentioned this issue Aug 30, 2020

DOC: Add Notes about difference to numpy behaviour for ddof in std() GH35985 #35986

Merged

5 tasks

MarcoGorelli closed this as completed in #35986 Sep 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: make difference between numpy behaviour clearer in Dataframe.std() and Series.std() #35985

DOC: make difference between numpy behaviour clearer in Dataframe.std() and Series.std() #35985

timhunderwood commented Aug 30, 2020 •

edited

Loading

MarcoGorelli commented Aug 30, 2020

DOC: make difference between numpy behaviour clearer in Dataframe.std() and Series.std() #35985

DOC: make difference between numpy behaviour clearer in Dataframe.std() and Series.std() #35985

Comments

timhunderwood commented Aug 30, 2020 • edited Loading

Location of the documentation

Documentation problem

Suggested fix for documentation

MarcoGorelli commented Aug 30, 2020

timhunderwood commented Aug 30, 2020 •

edited

Loading