Skip to content

Regression in DataFrameGroupBy.head() #6721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danielballan opened this issue Mar 27, 2014 · 5 comments · Fixed by #27844
Closed

Regression in DataFrameGroupBy.head() #6721

danielballan opened this issue Mar 27, 2014 · 5 comments · Fixed by #27844

Comments

@danielballan
Copy link
Contributor

For this SO question I isolated a regression in DataFrameGroupBy.head() between 0.12.0 and 0.13.0.

In [1]: df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': range(6)})

In 0.12.0:

In [2]: df.groupby('A').head()
Out[2]: 
             A  B
A                
one   0    one  0
      1    one  1
      5    one  5
three 3  three  3
      4  three  4
two   2    two  2

which effectively returns the rows of the underlying .obj sorted by group. In 0.13.0:

In [2]: df.groupby('A').head()
Out[2]: 
             A  B
A                
one   0    one  0
      1    one  1
two   2    two  2
three 3  three  3
      4  three  4
one   5    one  5

[6 rows x 2 columns]

which returns the rows of the underlying .obj in their original order. The former is more intuitive.

@jreback
Copy link
Contributor

jreback commented Mar 27, 2014

This in master, looks correct to me and does not include the grouping coumn either)

In [2]: df.groupby('A').head()
Out[2]: 
       A  B
0    one  0
1    one  1
2    two  2
3  three  3
4  three  4
5    one  5

[6 rows x 2 columns]

@danielballan
Copy link
Contributor Author

As I note in my answer, the existence of head() on a GroupBy is a little confusing. As I think about it, I'm not sure whether it's better to have it show the underlying object or a special re-sorted view.

@danielballan
Copy link
Contributor Author

OK, I can live with that. Closing.

@jorisvandenbossche
Copy link
Member

I don't know if it is worth to bikeshed over, but I think we should then at least update the docstring, so marked it as a DOC issue (https://github.com/pydata/pandas/blob/master/pandas/core/groupby.py#L790)

  • clarify that it returns "Returns first n rows of each group in original order."?
  • that it is equivalent to .apply(lambda x: x.head(n)) is no longer true (that was the previous behaviour)
  • the example uses as_index=False, while the docs say it does ignore as_index

@jreback jreback added this to the 0.14.0 milestone Mar 28, 2014
@jreback
Copy link
Contributor

jreback commented Mar 28, 2014

fair enough to make doc string consistent

@jreback jreback added the Groupby label Apr 9, 2014
@jreback jreback modified the milestones: 0.14.1, 0.14.0 May 5, 2014
@jreback jreback modified the milestones: 0.15.0, 0.14.1 Jul 1, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
mtrbean added a commit to mtrbean/pandas that referenced this issue Aug 9, 2019
mtrbean added a commit to mtrbean/pandas that referenced this issue Aug 9, 2019
mtrbean added a commit to mtrbean/pandas that referenced this issue Aug 9, 2019
mtrbean added a commit to mtrbean/pandas that referenced this issue Aug 20, 2019
mtrbean added a commit to mtrbean/pandas that referenced this issue Aug 20, 2019
@jorisvandenbossche jorisvandenbossche modified the milestones: Contributions Welcome, 1.0 Aug 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants