Skip to content

Add nunique and unique to series apply whitelist #6146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bburan-galenea opened this issue Jan 28, 2014 · 1 comment · Fixed by #6147
Closed

Add nunique and unique to series apply whitelist #6146

bburan-galenea opened this issue Jan 28, 2014 · 1 comment · Fixed by #6147

Comments

@bburan-galenea
Copy link
Contributor

The unique and nunique attributes are very useful in conjunction with series groupby operations. I used these extensively in previous versions of Pandas whenever I needed to get a list of unique values for each subgroup (or the number of unique values). This can be used, for example, to count the number of subjects in each treatment group (or get a list of the subject IDs for reporting):

data = pandas.DataFrame({
    'subject_id': ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'),
    'treatment': (0, 0, 0, 0, 0, 1, 1, 1, 0, 0),
})
print data.groupby('treatment').subject_id.apply(lambda x: x.nunique())
print data.groupby('treatment').subject_id.apply(lambda x: x.unique())

We'd get the following output:

treatment
0            7
1            3
dtype: int64

treatment
0            [A, B, C, D, E, I, J]
1                        [F, G, H]
dtype: object

This is super-useful for generating summary statistics (e.g. N's) and debugging (e.g. tracking down which subjects are in which groups. In previous versions of Pandas, we could simply do:

print data.groupby('treatment').subject_id.nunique()
print data.groupby('treatment').subject_id.unique()

It would be nice to continue this. Is there a reason why nunique and unique can't be added to the whitelist?

@jreback
Copy link
Contributor

jreback commented Jan 28, 2014

no....care to do a quick PR (just update the whitelist and the tests for it)...

FYI

you can always do:

data.groupby(...).apply(lambda x: x.unique())

and that is excatly what is happenin in this case (its implemented essentially as a generated function)

bburan-galenea added a commit to bburan-galenea/pandas that referenced this issue Jan 28, 2014
Series.unique and Series.nunique are useful methods for querying the
unique elements (or number of unique elements) in each group when used
in conjunction with DataFrame.GroupBy.  This commit addresses
pandas-dev#6146.
bburan-galenea added a commit to bburan-galenea/pandas that referenced this issue Jan 28, 2014
jreback pushed a commit to jreback/pandas that referenced this issue Feb 4, 2014
Series.unique and Series.nunique are useful methods for querying the
unique elements (or number of unique elements) in each group when used
in conjunction with DataFrame.GroupBy.  This commit addresses
pandas-dev#6146.
jreback pushed a commit to jreback/pandas that referenced this issue Feb 4, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants