df.groupby.agg() removes name of column MultiIndex at level 0 #4013

floux · 2013-06-24T17:22:36Z

When applying different functions to columns with a MultiIndex by supplying a mapping to groupby.agg(), the top-level name of the columns get lost.

I believe this is a bug, because the names of the columns are unchanged (the total number of columns might be smaller, if not all columns are in the mapping, though).

In the example here I am using groupby.agg(), even though technically speaking I want to do a transformation. However, groupby.agg() seems to be the only apply-like method that allows the usage of a mapping for different functions per column. What would be the recommended way?

In [2]: df = pd.DataFrame({
   ...:         'exp' : ['A']*6 + ['B']*6,
   ...:         'obj' : [1,1,1,2,2,2]*2,
   ...:         'rep' : [1,2,3] * 4,
   ...:         'var1' : range(12),
   ...:         'var2' : range(12,24),
   ...:         'var3' : range(24,36),
   ...:         })

In [3]: df = df.set_index(['exp', 'obj', 'rep'])

In [4]: df = df.sort_index()

In [5]: df.columns.name = 'vars'

In [6]: print('before unstack: ', df.columns.names)
('before unstack: ', ['vars'])

In [7]: df = df.unstack('rep')

In [8]: print('after unstack: ', df.columns.names)
('after unstack: ', ['vars', 'rep'])

In [9]: funcs = {
   ...:                 'var1' : lambda x: x - x.median(),
   ...:                 'var2' : lambda y: y - y.mean(),
   ...:                 'var3' : lambda y: y - y.sum(),
   ...: }

In [10]: df1 = df.groupby(level=0).agg(funcs)

In [11]: print('after groupby.agg: ', df1.columns.names)
('after groupby.agg: ', [None, 'rep'])

jtratner · 2013-09-11T23:58:37Z

others have thoughts on this? I guess if you're using agg, it ought to maintain the metadata, given that you have to specify columns that already exist. Not sure about other cases.

jreback · 2014-03-28T23:31:24Z

This is a nonsencial example, as it produces a Dataframe where each element is a DataFrame itself. This is not valid as a real grouping. Yes it works, but it is impossible to infer anything from this. If you have a simpler example that you can provide that shows the missing columns, pls show it.

I wouldn't actually do it this way, rather simply get the grouper, iterate it yourself, apply whatever transformation you want and then do what you want with the results.

Groupby is not meant to handle any and all cases.

ghost assigned jtratner Sep 9, 2013

jtratner mentioned this issue Sep 17, 2013

ENH: Allow fast comparisons of Index views, similar to 'is' checks. #4859

Closed

jreback closed this as completed Mar 28, 2014

wesm unassigned jtratner Oct 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

df.groupby.agg() removes name of column MultiIndex at level 0 #4013

df.groupby.agg() removes name of column MultiIndex at level 0 #4013

floux commented Jun 24, 2013

jtratner commented Sep 11, 2013

jreback commented Mar 28, 2014

df.groupby.agg() removes name of column MultiIndex at level 0 #4013

df.groupby.agg() removes name of column MultiIndex at level 0 #4013

Comments

floux commented Jun 24, 2013

jtratner commented Sep 11, 2013

jreback commented Mar 28, 2014