Skip to content

Add filtering capability to GroupBy #919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wesm opened this issue Mar 15, 2012 · 7 comments · Fixed by #3680
Closed

Add filtering capability to GroupBy #919

wesm opened this issue Mar 15, 2012 · 7 comments · Fixed by #3680

Comments

@wesm
Copy link
Member

wesm commented Mar 15, 2012

Can be accomplished in a hackish way using apply, but a more structured approach would be nice

7/12/2012: Not sure what I was intending with this one

@sanand0
Copy link

sanand0 commented Oct 13, 2012

This would be quite useful.

For example, if for an address book, data.groupby('city') lists 1000 cities, and we want those with over 100 entries, would be useful to be able to say something like:

grouped = data.groupby('city')
grouped.filter(grouped.size() > 100)

... and then compute on just that subset.

@apratap
Copy link

apratap commented Oct 31, 2012

FYI : without knowing about this open issue, I stumbled upon the same cleaning requirement. Would be nice to have this in pandas but for now I was able to move fwd.

Ref: http://stackoverflow.com/questions/13167391/filtering-grouped-df-in-pandas

@apratap
Copy link

apratap commented Nov 1, 2012

Wesley: Can you please help me with the apply hack ? I still cant seem to filter grouped data. More details on the stackoverflow post.link above. Thanks! -Abhi

@blounsbury-usbr
Copy link

Probably bad issue etiquette but just wanted to add my +1 for this enhancement. I grouped my data by year (hydrologic water year actually) and then wanted to remove years with less than 365 days of data. I used the stackoverflow answer of pandas.concat() to work around it. But that is pretty ugly.

I agree with sanand0 that grouped.filter() would be easiest. Another possibility would be to add a 'drop()' function to a DataFrameGroupBy object. This would allow a loop over len(group.groups[name]).

@wesm
Copy link
Member Author

wesm commented Nov 21, 2012

@jalperin
Copy link

another +1. I can't quite figure out what the best way to work around it is, in a generic way. Which of the SO answers do you recommend Wesley?

@jreback
Copy link
Contributor

jreback commented Jun 6, 2013

closed via #3680

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants