groupby.agg changes the default parameter of np.std #13344

ayhanfuat · 2016-06-01T16:02:23Z

Code Sample, a copy-pastable example if possible

As opposed to pandas, the default value for ddof in standard deviation calculations is 0 in numpy. However, when I use groupby.agg(np.std) I get the result for ddof=1.

import numpy as np
import pandas as pd
prng = np.random.RandomState(0)
df = pd.DataFrame({"A": prng.choice(list("abcdef"), 15), 
                   "B": prng.randint(0, 10, 15)})
df.groupby('A')['B'].agg(np.std)

Out[54]: 
A
a    3.511885
b         NaN
c    2.828427
d    3.593976
e    1.527525
f    1.414214
Name: B, dtype: float64

Expected Output

There is a note in the docs that states the default axis is changed but for me ddof change is quite unexpected. So my expected output is this:

df.groupby('A')['B'].apply(np.std)

Out[61]: 
A
a    2.867442
b    0.000000
c    2.000000
d    3.112475
e    1.247219
f    1.000000
Name: B, dtype: float64

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: 4.1.2
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-06-01T17:31:56Z

this is as expected (and has been these way for as long as I can remember).

numpy functions are translated to pandas ones. If you really want to use numpy then.

.groupby(...).agg(lambda x: np.std(x))

jreback · 2016-06-01T17:32:22Z

you can also explicty pass in ddof=0 if you want.

ayhanfuat · 2016-06-01T18:16:13Z

I thought about that possibility but seeing that it is not translated in df.groupby('A')['B'].apply(np.std) I thought it could be a bug. Thanks.

EricCousineau-TRI · 2022-04-13T22:12:58Z

you can also explicty pass in ddof=0 if you want.

While some approaches use lambda x: np.std(x), you can also use functools.partial(np.std) as a way to defeat func is np.std (which is what seems to trigger this family of behaviors) - it also preserves the name.

FWIW Came here by way of StackOverflow post: https://stackoverflow.com/a/47699378/7829525

stevetracvc · 2022-08-12T00:44:53Z

I'm posting here in case someone else runs into the same issue. I want both mean and standard deviation, but I can't pass in ddof to the std function when using more than one aggregate function. So then, if I try to do something like:

df.agg(["mean", lambda x: np.std(x, ddof=0)])

That throws ValueError: cannot combine transform and aggregation operations (same if I try to use partial).

So instead I tried to do

df.agg([lambda x: np.mean(x), lambda x: np.std(x, ddof=0)])

But this actually does not work correctly because it's trying to transform each cell instead of doing an actual aggretate.

So this then turns into

df.agg(["mean", lambda x: np.std(x, axis=0, ddof=0)])

jreback closed this as completed Jun 1, 2016

jreback added Usage Question Numeric Operations Arithmetic, Comparison, and Logical operations labels Jun 1, 2016

jreback added this to the No action milestone Jun 1, 2016

yuzie007 mentioned this issue Dec 11, 2017

ddof for np.std in df.agg changes depending on how given & lambda expression does not work correctly in a list of functions #18734

Closed

suddensleep mentioned this issue Feb 7, 2020

Different behavior of numpy functions when used as a Python lambda function #31782

Closed

jreback mentioned this issue Oct 9, 2020

BUG: aggfunc use different default arguments in pivot_table #36508

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupby.agg changes the default parameter of np.std #13344

groupby.agg changes the default parameter of np.std #13344

ayhanfuat commented Jun 1, 2016 •

edited

Loading

jreback commented Jun 1, 2016

jreback commented Jun 1, 2016

ayhanfuat commented Jun 1, 2016

EricCousineau-TRI commented Apr 13, 2022

stevetracvc commented Aug 12, 2022 •

edited

Loading

groupby.agg changes the default parameter of np.std #13344

groupby.agg changes the default parameter of np.std #13344

Comments

ayhanfuat commented Jun 1, 2016 • edited Loading

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

jreback commented Jun 1, 2016

jreback commented Jun 1, 2016

ayhanfuat commented Jun 1, 2016

EricCousineau-TRI commented Apr 13, 2022

stevetracvc commented Aug 12, 2022 • edited Loading

ayhanfuat commented Jun 1, 2016 •

edited

Loading

output of `pd.show_versions()`

stevetracvc commented Aug 12, 2022 •

edited

Loading