Skip to content

df.groupby.agg() removes name of column MultiIndex at level 0 #4013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
floux opened this issue Jun 24, 2013 · 2 comments
Closed

df.groupby.agg() removes name of column MultiIndex at level 0 #4013

floux opened this issue Jun 24, 2013 · 2 comments

Comments

@floux
Copy link

floux commented Jun 24, 2013

When applying different functions to columns with a MultiIndex by supplying a mapping to groupby.agg(), the top-level name of the columns get lost.

I believe this is a bug, because the names of the columns are unchanged (the total number of columns might be smaller, if not all columns are in the mapping, though).

In the example here I am using groupby.agg(), even though technically speaking I want to do a transformation. However, groupby.agg() seems to be the only apply-like method that allows the usage of a mapping for different functions per column. What would be the recommended way?

In [2]: df = pd.DataFrame({
   ...:         'exp' : ['A']*6 + ['B']*6,
   ...:         'obj' : [1,1,1,2,2,2]*2,
   ...:         'rep' : [1,2,3] * 4,
   ...:         'var1' : range(12),
   ...:         'var2' : range(12,24),
   ...:         'var3' : range(24,36),
   ...:         })

In [3]: df = df.set_index(['exp', 'obj', 'rep'])

In [4]: df = df.sort_index()

In [5]: df.columns.name = 'vars'

In [6]: print('before unstack: ', df.columns.names)
('before unstack: ', ['vars'])

In [7]: df = df.unstack('rep')

In [8]: print('after unstack: ', df.columns.names)
('after unstack: ', ['vars', 'rep'])

In [9]: funcs = {
   ...:                 'var1' : lambda x: x - x.median(),
   ...:                 'var2' : lambda y: y - y.mean(),
   ...:                 'var3' : lambda y: y - y.sum(),
   ...: }

In [10]: df1 = df.groupby(level=0).agg(funcs)

In [11]: print('after groupby.agg: ', df1.columns.names)
('after groupby.agg: ', [None, 'rep'])
@ghost ghost assigned jtratner Sep 9, 2013
@jtratner
Copy link
Contributor

others have thoughts on this? I guess if you're using agg, it ought to maintain the metadata, given that you have to specify columns that already exist. Not sure about other cases.

@jreback
Copy link
Contributor

jreback commented Mar 28, 2014

This is a nonsencial example, as it produces a Dataframe where each element is a DataFrame itself. This is not valid as a real grouping. Yes it works, but it is impossible to infer anything from this. If you have a simpler example that you can provide that shows the missing columns, pls show it.

I wouldn't actually do it this way, rather simply get the grouper, iterate it yourself, apply whatever transformation you want and then do what you want with the results.

Groupby is not meant to handle any and all cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants