Add nunique and unique to series apply whitelist #6146

bburan-galenea · 2014-01-28T17:57:27Z

The unique and nunique attributes are very useful in conjunction with series groupby operations. I used these extensively in previous versions of Pandas whenever I needed to get a list of unique values for each subgroup (or the number of unique values). This can be used, for example, to count the number of subjects in each treatment group (or get a list of the subject IDs for reporting):

data = pandas.DataFrame({
    'subject_id': ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'),
    'treatment': (0, 0, 0, 0, 0, 1, 1, 1, 0, 0),
})
print data.groupby('treatment').subject_id.apply(lambda x: x.nunique())
print data.groupby('treatment').subject_id.apply(lambda x: x.unique())

We'd get the following output:

treatment
0            7
1            3
dtype: int64

treatment
0            [A, B, C, D, E, I, J]
1                        [F, G, H]
dtype: object

This is super-useful for generating summary statistics (e.g. N's) and debugging (e.g. tracking down which subjects are in which groups. In previous versions of Pandas, we could simply do:

print data.groupby('treatment').subject_id.nunique()
print data.groupby('treatment').subject_id.unique()

It would be nice to continue this. Is there a reason why nunique and unique can't be added to the whitelist?

The text was updated successfully, but these errors were encountered:

jreback · 2014-01-28T18:00:20Z

no....care to do a quick PR (just update the whitelist and the tests for it)...

FYI

you can always do:

data.groupby(...).apply(lambda x: x.unique())

and that is excatly what is happenin in this case (its implemented essentially as a generated function)

Series.unique and Series.nunique are useful methods for querying the unique elements (or number of unique elements) in each group when used in conjunction with DataFrame.GroupBy. This commit addresses pandas-dev#6146.

Added tests for fix to pandas-dev#6146.

Series.unique and Series.nunique are useful methods for querying the unique elements (or number of unique elements) in each group when used in conjunction with DataFrame.GroupBy. This commit addresses pandas-dev#6146.

Added tests for fix to pandas-dev#6146.

bburan-galenea mentioned this issue Jan 28, 2014

ENH: Add unique and nunique to GroupBy whitelist #6147

Merged

bburan-galenea added a commit to bburan-galenea/pandas that referenced this issue Jan 28, 2014

ENH: Updated tests for series GroupBy whitelist

93dd1c3

Added tests for fix to pandas-dev#6146.

jreback closed this as completed in #6147 Jan 28, 2014

jreback pushed a commit to jreback/pandas that referenced this issue Feb 4, 2014

ENH: Updated tests for series GroupBy whitelist

0554ac7

Added tests for fix to pandas-dev#6146.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nunique and unique to series apply whitelist #6146

Add nunique and unique to series apply whitelist #6146

bburan-galenea commented Jan 28, 2014

jreback commented Jan 28, 2014

Add nunique and unique to series apply whitelist #6146

Add nunique and unique to series apply whitelist #6146

Comments

bburan-galenea commented Jan 28, 2014

jreback commented Jan 28, 2014