ENH: allow 'size' in groupby aggregation #6312

jreback · 2014-02-09T15:44:29Z

Allow to use 'size' in groupby's aggregate, so you can do:

df.groupby(..).agg('size') 
df.groupny(..).agg(['mean', 'size'])

http://stackoverflow.com/questions/21660686/pandas-groupby-straight-forward-counting-the-number-of-elements-in-each-group-i

~~count should directly implement size (enh)~~
count/size should be allowed in an aggregation list (the bug)

The text was updated successfully, but these errors were encountered:

eldad-a · 2014-02-10T06:30:19Z

Please note, the current (slow) count is allowed in the aggregation list

hayd · 2014-03-04T18:10:12Z

so count is non-null values, which goes someway to explain why it is slower.

jreback · 2014-03-04T18:16:59Z

yep...this is a very easy fix (just alias count to size) as its already comptued by the group indexer

hayd · 2014-03-04T19:38:27Z

What I mean is, count is a different operation to size, size just cares about the result_index whilst count cares about whether values are non-null in columns... (same thing with vaue_counts, sometime user may want to count at values in another column).

jorisvandenbossche · 2016-11-11T10:57:14Z

@jreback This issue is not really clear to me, as size does already exist for groupby?
Or do you mean that you want to be able to do df.groupby(..).agg('size') instead of df.groupby(..).size() (and therefore be able to do agg(['mean', 'size']) ?

And I think we don't want to "alias count to size", as size does something different as count.

jreback · 2016-11-11T11:24:56Z

no i think we just need to alias size (like we do mean). iow add it to the cython table i think (this might work now)

jorisvandenbossche · 2016-11-11T11:33:55Z

@jreback updated top post to clarify the issue

jreback · 2016-11-11T11:39:08Z

I'll note that we should look at count perf as well (maybe create another issue); it may have been fixed since this issue

jreback · 2016-11-11T12:45:53Z

In [7]: df = pd.DataFrame({'x':np.random.randn(50000), ## produce the demo DataFrame 
   ...:    ...:                    'y':np.random.randn(50000),
   ...:    ...:                    'z':np.random.randn(50000)})
   ...: 
   ...: In [4]: buckets = {col : np.arange(int(df[col].min()) ,int(df[col].max())+2) 
   ...:    ...:            for col in df.columns} ## produce the unit bins
   ...: 
   ...: In [5]: cats = [pd.cut(df[col], bucket) for col,bucket in buckets.items()]
   ...: 
   ...: In [6]: grouped = df.groupby(cats) # group by the binned x,y,z
   ...: 
   ...: 

In [20]: %timeit -n1 -r1 grouped.x.size()
1 loop, best of 1: 642 µs per loop

In [9]: %timeit -n1 -r1 grouped.x.mean()
1 loop, best of 1: 2.98 ms per loop

In [10]: %timeit -n1 -r1 grouped.x.count()
1 loop, best of 1: 696 µs per loop

In [19]: %timeit -n1 -r1 grouped.x.agg(['mean','size'])
1 loop, best of 1: 1.62 ms per loop

this is implemented actually.

jreback mentioned this issue Mar 4, 2014

Change groupby value_counts (from fall through behaviour) #6540

Closed

jreback modified the milestones: 0.15.0, 0.14.0 Apr 21, 2014

jreback added the API Design label Apr 21, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015

jorisvandenbossche changed the title ~~BUG: size should be allowed in groupby aggregation~~ ENH: allow 'size' in groupby aggregation Nov 11, 2016

jorisvandenbossche added Enhancement and removed API Design Bug labels Nov 11, 2016

jreback closed this as completed Nov 11, 2016

jorisvandenbossche modified the milestones: No action, Next Major Release Nov 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: allow 'size' in groupby aggregation #6312

ENH: allow 'size' in groupby aggregation #6312

jreback commented Feb 9, 2014 •

edited by jorisvandenbossche

Loading

eldad-a commented Feb 10, 2014

hayd commented Mar 4, 2014

jreback commented Mar 4, 2014

hayd commented Mar 4, 2014

jorisvandenbossche commented Nov 11, 2016

jreback commented Nov 11, 2016

jorisvandenbossche commented Nov 11, 2016

jreback commented Nov 11, 2016

jreback commented Nov 11, 2016 •

edited

Loading

ENH: allow 'size' in groupby aggregation #6312

ENH: allow 'size' in groupby aggregation #6312

Comments

jreback commented Feb 9, 2014 • edited by jorisvandenbossche Loading

eldad-a commented Feb 10, 2014

hayd commented Mar 4, 2014

jreback commented Mar 4, 2014

hayd commented Mar 4, 2014

jorisvandenbossche commented Nov 11, 2016

jreback commented Nov 11, 2016

jorisvandenbossche commented Nov 11, 2016

jreback commented Nov 11, 2016

jreback commented Nov 11, 2016 • edited Loading

jreback commented Feb 9, 2014 •

edited by jorisvandenbossche

Loading

jreback commented Nov 11, 2016 •

edited

Loading