Skip to content

Docs: broken groupby.GroupBy.apply example #19337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
joosthvanderlinden opened this issue Jan 22, 2018 · 1 comment · Fixed by #20098
Closed

Docs: broken groupby.GroupBy.apply example #19337

joosthvanderlinden opened this issue Jan 22, 2018 · 1 comment · Fixed by #20098

Comments

@joosthvanderlinden
Copy link

Sample

Copied directly from pandas-docs-travis:

import pandas as pd
df = pd.DataFrame({'A': 'a a b'.split(), 'B': [1,2,3], 'C': [4,6, 5]})
g = df.groupby('A')

# ValueError
g.apply(lambda x: x / x.sum()) 

# ValueError
g.apply(lambda x: x.max() - x.min()) 

# Works fine
g.apply(lambda x: x.C.max() - x.B.min()) 

Problem description

The first two apply operations return ValueError: can only convert an array of size 1 to a Python scalar. The third apply operation works as expected.

Expected Output

As given by pandas-docs-travis:

>>> g.apply(lambda x: x / x.sum())
          B    C
0  0.333333  0.4
1  0.666667  0.6
2  1.000000  1.0
>>> g.apply(lambda x: x.max() - x.min())
   B  C
A
a  1  2
b  0  0
>>> g.apply(lambda x: x.C.max() - x.B.min())
A
a    5
b    2
dtype: int64

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: None.None

pandas: 0.22.0
pytest: 3.0.5
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.4.1
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.4
feather: None
matplotlib: 2.1.1
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: 0.9999999
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jorisvandenbossche
Copy link
Member

@joosthvanderlinden thanks a lot for the report!

I confirm that those indeed raise an error, and I went back to pandas 0.14 and there is was still failing (I expected it would have worked at a certain moment in an older release of pandas, as the examples seem like copy pasted from actual run code).

The main 'problem' in this case is that also the key column 'A' is present in the dataframe passed to the lambda function, and this fails on the numerical operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants