Skip to content

Add tests to ensure sort preserved by groupby, add docs #10931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

nickeubank
Copy link
Contributor

xref #9651
closes #8588

Adds test to ensure the sort of a target object is preserved within groupby() groups, modifies docs to make it clear sort is preserved within groups.

@nickeubank nickeubank force-pushed the test_groupby_sort_preservation branch 5 times, most recently from ed6eb1a to 8bcd4a5 Compare August 30, 2015 21:49
g.get_group('A')

g.get_group('B')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this is True, but I think the average user will find this a bit confusing. Can you expand/reword a bit. (its an important point), but not sure how to describe it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok -- dropped the discussion of what exactly is happening behind the scenes, tried to clear up the explanation above, and fleshed out the examples a little more. That clearer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason a user would even expect that the already created groupby object would change after sorting the original frame?
Is there an example of confusion? As adding this explanation of something that I would have never thought of seems to make it only more complex than it should be IMO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Joris. This behavior is entirely consistent with the standard memory model in Python. It does not need explaining.

@jreback jreback added this to the 0.17.0 milestone Sep 4, 2015
@nickeubank nickeubank force-pushed the test_groupby_sort_preservation branch 2 times, most recently from c77e701 to f579d6e Compare September 4, 2015 18:33
@nickeubank
Copy link
Contributor Author

@jreback updated!

@jreback
Copy link
Contributor

jreback commented Sep 4, 2015

@jorisvandenbossche @shoyer ?


.. note::

Users should be careful about re-sorting their data after execution of ``groupby()``, however. ``groupby()`` ensures that the order of observations within each group will always be the same as they were in the original data at the time that ``groupby()`` was executed. If a user creates a ``groupby()`` object *and then re-sorts the original data*, the order of observations within each group created by ``groupby()`` will reflect the order in which observations were sorted when ``groupby()`` was executed, **not** the order of observations following the re-sorting. For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you hard wrap here?

@jorisvandenbossche
Copy link
Member

The basic part on sort=True/False and sort preserved within groups looks good!

However, I don't really seem to grasp the need to explain the sorting after groupby: "Users should be careful about re-sorting their data after execution of groupby()". Why should they be careful? Is there anything not-expected happening?

@nickeubank nickeubank force-pushed the test_groupby_sort_preservation branch 2 times, most recently from b060c0e to 4817cb3 Compare September 5, 2015 21:13
@nickeubank
Copy link
Contributor Author

@jorisvandenbossche @shoyer @jreback OK, warning dropped!

Sort group keys. Get better performance by turning this off
Sort group keys. Get better performance by turning this off.
Note this does not influence the order of observations within each group.
groupby preserves the sorted order of the target object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorted order -> order ? (as the order within a group is not necessarily sorted?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about "order in which observations appear"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just order of observations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or 'rows'?

groupby preserves the order of the rows within each group

@nickeubank nickeubank force-pushed the test_groupby_sort_preservation branch from 4817cb3 to 3b49fad Compare September 5, 2015 21:38
@nickeubank
Copy link
Contributor Author

@jorisvandenbossche great, added.

jreback added a commit that referenced this pull request Sep 5, 2015
Add tests to ensure sort preserved by groupby, add docs
@jreback jreback merged commit 5dea811 into pandas-dev:master Sep 5, 2015
@jreback
Copy link
Contributor

jreback commented Sep 5, 2015

@nickeubank thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC/API: clarify groupby sorting behavior
4 participants