-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH add cumcount groupby method #5510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As an aside, I was sure you used to do be able to do |
@hayd that's an open issue and being re-enabled. You could reincorporate it here if necessary to make this PR works. |
""" | ||
try to cast the result to our obj original type, | ||
we may have roundtripped thru object in the mean-time | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind removing this blank line if you're editing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it in specifically, pep8 says it should be there, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, no idea.
I really there ought to be tests for dupe index too. In which case this implementation fails (!) @jreback Is this a wider bug in apply alignment / or is there a cheeky way to do this without a sort:
|
Refactored to work with dupes, also faster:
|
Number each item in each group from 0 to the length of that group. | ||
|
||
Essentially this is equivalent to | ||
>>> self.apply(lambda x: Series(np.arange(len(x)), x.index)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There has to be a blank line between "Essentially" and the code to render it as code I think (in html docstring).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed :)
@hayd I think this is ok fine for 0.13. maybe a doc mention in groupby.rst (could be a separate PR) |
index = self.obj.index | ||
cumcounts = np.zeros(len(index), dtype='int') | ||
for v in self.indices.values(): | ||
cumcounts[v] = np.arange(len(v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clever way to do it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to specify dtype here (to avoid issues on 32-bit plat), e.g. np.arange(len(v),dtype='int64')
Updated those to int64, good spot. I'm not sure what/where to add in the groupby.rst (there is nothing about cumsum and friends either)... "Other useful features" ? |
maybe create another section after |
Just appended a small mention of it at the end of groupby.rst. ...cumsum etc. are covered in dispatched (though could do more in the future). :S |
~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Sometimes you want to keep track of the order in which each row appears within | ||
its group. You can see this with the ``cumcount`` method: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just horrible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "To see the order in which each row appears within its group, use the
cumcount
method"
going to go ahead an merge this :) |
closes #4646
Happy to hear ways to make this faster, but I think this is a good convenience function as is.
Update: made significantly faster...