Skip to content

Apply on selected columns of a groupby object - stopped working with 0.18.1 #13568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tpietruszka opened this issue Jul 5, 2016 · 4 comments · Fixed by #13585
Closed

Apply on selected columns of a groupby object - stopped working with 0.18.1 #13568

tpietruszka opened this issue Jul 5, 2016 · 4 comments · Fixed by #13585
Labels
Bug Groupby Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@tpietruszka
Copy link

Code Sample

>>> import pandas as pd
>>> import numpy as np
>>> 
>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
...                            'foo', 'bar', 'foo', 'foo'],
...                     'B' : ['one', 'one', 'two', 'three',
...                            'two', 'two', 'one', 'three'],
...                     'C' : np.random.randn(8),
...                     'D' : np.random.randn(8)})
>>> 
>>> gb = df.groupby('B')
>>> 
>>> gb[['C', 'D']].apply(lambda group: group.C.mean() - group.D.mean())

Works fine on 0.18.0.
On 0.18.1 throws:

TypeError: Series.name must be a hashable type

Expected Output

B
one 0.609650
three -1.027308
two 0.515237
dtype: float64

(some other random numbers)

Comment

With 0.18.1 update, some piece of my code stopped working, in a quite surprising way.

The code is easy to fix - just by removing [['C', 'D']] and applying on the whole groupby object.

Nevertheless, its a regression in a point release and I think someone else may run into the same problem.

I am guessing that pandas is trying to generate a name for the result series based on the column list - unfortunately I cannot investigate further right now.

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 24.0.2
Cython: None
numpy: 1.11.1
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

@jorisvandenbossche jorisvandenbossche added Bug Groupby Regression Functionality that used to work in a prior pandas version labels Jul 5, 2016
@jorisvandenbossche jorisvandenbossche added this to the 0.18.2 milestone Jul 5, 2016
@jreback
Copy link
Contributor

jreback commented Jul 5, 2016

@tpietruszka not EVERYTHING is tested. If its a regression then obviously its not.

In practice you would never use .apply that way. Its horribly inefficient and non-idiomatic.

Correctly, do this.

In [24]: gb.C.mean()-gb.D.mean()
Out[24]: 
B
one     -0.119580
three   -0.751664
two      1.644204
dtype: float64

@tpietruszka
Copy link
Author

@jreback
Thank you for the advice, however this is just a minimal example to reproduce the bug, far from my actual code.
And I do believe in some cases clarity is more important than performance.

Anyway, I was assuming that reporting problems - however small - is a helpful thing. It seems this one was taken as an offence of some kind - that obviously was not my intention.

Regards,
Tomasz

@jorisvandenbossche
Copy link
Member

@tpietruszka No, it is certainly not an offence, we highly appreciate bug reports! (and yours was very clear and nicely reproducible, thanks for that)

@jorisvandenbossche
Copy link
Member

@tpietruszka It's fixed in master and so will be fixed in the next release. Thanks again for reporting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants