Skip to content

Update groupby docstring to make orderedness explicitly not guaranteed #104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 2, 2023

Conversation

kkraus14
Copy link
Collaborator

@kkraus14 kkraus14 commented Mar 1, 2023

Updates the DataFrame.groupby docstring to clarify that orderdedness of both keys and rows within each group is not guaranteed and is implementation defined.

Additionally, changes keys to be a Sequence[str] to be in line with other APIs expecting a sequence of column names.


Notes
-----
The order of the keys and the order of rows within each group is not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this likely needs to be on the GroupBy class-level docstring, since this is referring to the output of GroupBy methods

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect people to construct GroupBy objects directly? My thought was that this documentation would be most impactful where developers would typically look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough w/r/t location. wording-wise id like to clarify that it is about results of aggregations, otherwise we're leaking implementation details about keys/groups

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take a stab at rewording it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried rewording but still not very happy with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reads okay to me, but maybe it's useful to add something like: In particular, sorting on keys is not required.? Just to insert "sort" somewhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also just explicitly describe the relevant pandas/cudf differences in e.g. a Notes section

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't sound right. And it won't be just cuDF. For the record, this was basically the top wish list item (or pain point) from the SIG folks as well early on - they used hash-based methods too in their dataframe library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not suggesting wording that would suggest this is specific to pandas/cudf, just that an example may clarify what behavior should not be relied on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meeting there was a "ship it" sentiment - so let's do just that here:)

@rgommers rgommers added the documentation Improvements or additions to documentation label Mar 2, 2023
@rgommers rgommers merged commit 8dd7aa3 into data-apis:main Mar 2, 2023
@rgommers
Copy link
Member

rgommers commented Mar 2, 2023

Thanks @kkraus14 and @jbrockmendel!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants