-
Notifications
You must be signed in to change notification settings - Fork 21
Update groupby docstring to make orderedness explicitly not guaranteed #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
Notes | ||
----- | ||
The order of the keys and the order of rows within each group is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this likely needs to be on the GroupBy class-level docstring, since this is referring to the output of GroupBy methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect people to construct GroupBy
objects directly? My thought was that this documentation would be most impactful where developers would typically look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough w/r/t location. wording-wise id like to clarify that it is about results of aggregations, otherwise we're leaking implementation details about keys/groups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take a stab at rewording it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried rewording but still not very happy with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It reads okay to me, but maybe it's useful to add something like: In particular, sorting on keys
is not required.? Just to insert "sort" somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also just explicitly describe the relevant pandas/cudf differences in e.g. a Notes section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't sound right. And it won't be just cuDF. For the record, this was basically the top wish list item (or pain point) from the SIG folks as well early on - they used hash-based methods too in their dataframe library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not suggesting wording that would suggest this is specific to pandas/cudf, just that an example may clarify what behavior should not be relied on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the meeting there was a "ship it" sentiment - so let's do just that here:)
Thanks @kkraus14 and @jbrockmendel! |
Updates the
DataFrame.groupby
docstring to clarify that orderdedness of both keys and rows within each group is not guaranteed and is implementation defined.Additionally, changes keys to be a
Sequence[str]
to be in line with other APIs expecting a sequence of column names.