ENH: Keep series name when merging GroupBy result #6068

bburan-galenea · 2014-01-24T19:10:46Z

closes #6124
closes #6265

Use case

This will facilitate DataFrame group/apply transformations when using a function that returns a Series. Right now, if we perform the following:

import pandas
df = pandas.DataFrame(
        {'a':  [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
         'b':  [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1],
         'c':  [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
         'd':  [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
         })

def count_values(df):
    return pandas.Series({'count': df['b'].sum(), 'mean': df['c'].mean()}, name='metrics')

result = df.groupby('a').apply(count_values)
print result.stack().reset_index()

We get the following output:

   a level_1    0
0  0   count  2.0
1  0    mean  0.5
2  1   count  2.0
3  1    mean  0.5
4  2   count  2.0
5  2    mean  0.5

[6 rows x 3 columns]

Ideally, the series name should be preserved and propagated through these operations such that we get the following output:

   a metrics    0
0  0   count  2.0
1  0    mean  0.5
2  1   count  2.0
3  1    mean  0.5
4  2   count  2.0
5  2    mean  0.5

[6 rows x 3 columns]

The only way to achieve this (currently) is:

result = df.groupby('a').apply(count_values)
result.columns.name = 'metrics'
print result.stack().reset_index()

However, the key issue here is 1) this adds an extra line of code and 2) the name of the series created in the applied function may not be known in the outside block (so we can't properly fix the result.columns.name attribute).

The other work-around is to name the index of the series:

def count_values(df):
    series = pandas.Series({'count': df['b'].sum(), 'mean': df['c'].mean()})
    series.index.name = 'metrics'
    return series

During the group/apply operation, this pull request will check to see whether series.index has the name attribute set. If the name attribute is not set, it will set the index.name attribute to the name of the series (thus ensuring the name propagates).

jreback · 2014-01-24T19:20:19Z

does this have an associated issue?

bburan-galenea · 2014-01-24T19:21:52Z

I didn't create an issue. Should I?

jreback · 2014-01-24T19:25:49Z

you don't have to, but pls put an example of what is wrong, and then what this PR does in the top section

e.g. this feature is missing, this PR makes it do .....

bburan-galenea · 2014-01-24T19:48:15Z

Updated

jreback · 2014-01-25T00:01:29Z

actually...why don't you create an issue for this....that way can tag it with some labels.....
you can put the top part into the issue (e.g. the rationale / code sample) instead

bburan-galenea · 2014-01-27T14:40:20Z

No problem. Made an issue. Thanks for looking at this.

Updated docs to reflect a pagination bug that was fixed. Closes: issue #6068

jreback · 2014-02-15T21:13:13Z

can you rebase?

jreback · 2014-02-18T01:50:27Z

hows this coming?

bburan-galenea · 2014-02-24T16:22:56Z

@jreback - Second commit should close #6265 and I rebased. Had to push with the --force option. Hope that's OK.

jreback · 2014-02-24T16:32:53Z

yes force pushing is correct

jreback · 2014-02-24T19:44:09Z

pls add in a tests for #6265 (and release note mention) as well..thanks

bburan-galenea · 2014-03-05T15:56:29Z

Rebased, added tests, made sure tests passed, fixed as per discussion in #6124 and added release note mention.

bburan-galenea · 2014-03-05T15:56:48Z

FYI, did another rebase/force push.

jreback · 2014-03-05T16:38:30Z

this looks really good!

can you add the example (in the top of the PR) to the groupby.rst/examples section (just the fixed version). If you are really adventurous, you can put a 1-liner v0.14.0 under API changes with a reference to this. Its not an API change, but useful to know.

bburan-galenea · 2014-03-05T19:08:47Z

Added the example. There was already a note in the release file about the changes (it's more than one line though). Let me know if that should be shortened.

jreback · 2014-03-05T19:47:20Z

minor issue
can u annotate the test where 6124 is tested?

When possible, attempt to preserve the series name when performing groupby operations. This facilitates reshaping/indexing operations on the result of the groupby/apply or groupby/agg operation. Fixes GH6265 and GH6124. Added example to groupby.rst and description to API changes for v0.14.

bburan-galenea · 2014-03-05T19:55:12Z

Done. Amended previous commit and force-updated.

…_apply_series_name ENH: Keep series name when merging GroupBy result

jreback · 2014-03-05T21:59:06Z

thanks @bburan-galenea this was a great effort!

bburan-galenea mentioned this pull request Jan 27, 2014

Propagate Series.name attribute when merging series into data frame #6124

Closed

jreback added a commit that referenced this pull request Jan 29, 2014

Merge pull request #6157 from jacobschaer/master

e89029d

Updated docs to reflect a pagination bug that was fixed. Closes: issue #6068

jreback mentioned this pull request Feb 5, 2014

Missing Series name when using count() on groupby object #6265

Closed

jreback added Bug labels Mar 5, 2014

jreback added a commit that referenced this pull request Mar 5, 2014

Merge pull request #6068 from bburan-galenea/bburan/dataframe_groupby…

1fca5be

…_apply_series_name ENH: Keep series name when merging GroupBy result

jreback merged commit 1fca5be into pandas-dev:master Mar 5, 2014

rosnfeld mentioned this pull request Mar 6, 2014

BUG: preserve frequency across Timestamp addition/subtraction (#4547) #6560

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Keep series name when merging GroupBy result #6068

ENH: Keep series name when merging GroupBy result #6068

bburan-galenea commented Jan 24, 2014

jreback commented Jan 24, 2014

bburan-galenea commented Jan 24, 2014

jreback commented Jan 24, 2014

bburan-galenea commented Jan 24, 2014

jreback commented Jan 25, 2014

bburan-galenea commented Jan 27, 2014

jreback commented Feb 15, 2014

jreback commented Feb 18, 2014

bburan-galenea commented Feb 24, 2014

jreback commented Feb 24, 2014

jreback commented Feb 24, 2014

bburan-galenea commented Mar 5, 2014

bburan-galenea commented Mar 5, 2014

jreback commented Mar 5, 2014

bburan-galenea commented Mar 5, 2014

jreback commented Mar 5, 2014

bburan-galenea commented Mar 5, 2014

jreback commented Mar 5, 2014

ENH: Keep series name when merging GroupBy result #6068

ENH: Keep series name when merging GroupBy result #6068

Conversation

bburan-galenea commented Jan 24, 2014

Use case

jreback commented Jan 24, 2014

bburan-galenea commented Jan 24, 2014

jreback commented Jan 24, 2014

bburan-galenea commented Jan 24, 2014

jreback commented Jan 25, 2014

bburan-galenea commented Jan 27, 2014

jreback commented Feb 15, 2014

jreback commented Feb 18, 2014

bburan-galenea commented Feb 24, 2014

jreback commented Feb 24, 2014

jreback commented Feb 24, 2014

bburan-galenea commented Mar 5, 2014

bburan-galenea commented Mar 5, 2014

jreback commented Mar 5, 2014

bburan-galenea commented Mar 5, 2014

jreback commented Mar 5, 2014

bburan-galenea commented Mar 5, 2014

jreback commented Mar 5, 2014