-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
nunique performance for groupby with large number of groups #10820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
sure that's a clever way of doing this. Note that their is a passed can you submit a pull-request? |
@jreback Sure, could give it a try. Would appreciate if you could give some pointers as to where this sort of thing should go -- I'm only vaguely familiar with pandas codebase. As in, there's Also, if this is to be supported in |
This just becomes a method on currently this is defined on the whitelist (e.g. just search for |
Ok thanks!. By the way, would it make sense for Another somewhat related note -- the whole whitelist thing in |
If you wanted to try to change using the whitelist that would be ok. You could use a MetaClass, not sure if that would be much less magical, but maybe. |
Hmm not really though, since in your example counts for column C are [2, 1] whereas unique counts are [1, 1]. It's more like you have to groupby/reindex by |
I didn't say the result was the same just the idea |
Looks like in the most naive impl this would require to construct a groupby object to from another instantiated groupby object but also grouped by one additional key, wonder if there's any builtin functionality for that? |
It looks like
len(set)
beats bothlen(np.unique)
andpd.Series.nunique
if done naively -- here's an example with a large number of groups where we try to compute unique counts of a column when grouping by another column:The fastest way I know to accomplish the same thing is this:
... which is a LOT faster apparently.
Wonder if something similar could be done in
GroupBy.nunique
since it's quite a common use case?The text was updated successfully, but these errors were encountered: