You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In [31]: importnumpyasnpIn [32]: importpandasaspdIn [33]: df=pd.DataFrame(np.arange(6).reshape(3, 2), columns=['A', 'B'])
In [34]: dfOut[34]:
AB001123245In [35]: df.agg(np.std) # Behavior of ddof=0Out[35]:
A1.632993B1.632993dtype: float64In [36]: df.agg([np.std]) # Behavior of ddof=1Out[36]:
ABstd2.02.0In [37]: # So how to get the ddof=0 behavior when giving a list of functions?In [39]: df.agg([lambdax: np.std(x)]) # This gives a numerically unexpected result.Out[39]:
AB<lambda><lambda>00.00.010.00.020.00.0In [40]: df.agg([np.mean, lambdax: np.std(x)]) # This gives an error.---------------------------------------------------------------------------ValueErrorTraceback (mostrecentcalllast)
<ipython-input-40-52f4ec4195b5>in<module>()
---->1df.agg([np.mean, lambdax: np.std(x)])
/Users/ikeda/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/pandas/core/frame.pyinaggregate(self, func, axis, *args, **kwargs)
4740ifaxis==0:
4741try:
->4742result, how=self._aggregate(func, axis=0, *args, **kwargs)
4743exceptTypeError:
4744pass/Users/ikeda/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/pandas/core/base.pyin_aggregate(self, arg, *args, **kwargs)
537returnself._aggregate_multiple_funcs(arg,
538_level=_level,
-->539_axis=_axis), None540else:
541result=None/Users/ikeda/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/pandas/core/base.pyin_aggregate_multiple_funcs(self, arg, _level, _axis)
594# if we are empty595ifnotlen(results):
-->596raiseValueError("no results")
597598try:
ValueError: noresults
Problem description
When using, e.g., df.agg, the ddof (degrees of freedom) value for the function np.std changes depending on how the function is given (single function or a list of functions), which may be so confusing for many people. I believe the behavior should be unified in some way.
Furthermore, I could not find the way to obtain to the np.std result with ddof=0 by supplying it as one of the members of a list of functions. The lambda expression does not work well in a list of functions (this gives numerically unexpected results or even gives errors). This prohibits us to use many useful methods like df.agg, df.apply, and df.describe when we hope the ddof=0 behavior.
From #13344, I guess Developers prefer the ddof=1 behavior in pandas. So the expected behavior should be as below.
Expected Output
In [35]: df.agg(np.std) # Behavior of ddof=1
Out[35]:
A 2.0
B 2.0
dtype: float64
In [38]: df.agg([lambda x: np.std(x)]) # To obtain the ddof=0 results
Out[38]:
A B
<lambda> 1.632993 1.632993
In [41]: df.agg([np.mean, lambda x: np.std(x)])
A B
mean 2.0 3.0
<lambda> 1.632993 1.632993
Code Sample, a copy-pastable example if possible
Problem description
When using, e.g.,
df.agg
, theddof
(degrees of freedom) value for the functionnp.std
changes depending on how the function is given (single function or a list of functions), which may be so confusing for many people. I believe the behavior should be unified in some way.Furthermore, I could not find the way to obtain to the
np.std
result withddof=0
by supplying it as one of the members of a list of functions. Thelambda
expression does not work well in a list of functions (this gives numerically unexpected results or even gives errors). This prohibits us to use many useful methods likedf.agg
,df.apply
, anddf.describe
when we hope theddof=0
behavior.From #13344, I guess Developers prefer the
ddof=1
behavior in pandas. So the expected behavior should be as below.Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
pandas: 0.21.0
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.3
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: