Skip to content

Groupby count doesn't accept sort= keyword #28755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mrocklin opened this issue Oct 2, 2019 · 7 comments
Closed

Groupby count doesn't accept sort= keyword #28755

mrocklin opened this issue Oct 2, 2019 · 7 comments
Labels
API - Consistency Internal Consistency of API/Behavior Bug Groupby

Comments

@mrocklin
Copy link
Contributor

mrocklin commented Oct 2, 2019

Many groupby aggregations like sum, min, max, mean, and so on take a sort= keyword. Count doesn't for some reason.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"x": [3, 4, 1, 3, 4, 1], "y": [1, 2, 3, 4, 5, 6]})

In [3]: df.groupby("x").y.sum(sort=False)
Out[3]:
x
1    9
3    5
4    7
Name: y, dtype: int64

In [4]: df.groupby("x").y.count(sort=False)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-76e5507e57cb> in <module>
----> 1 df.groupby("x").y.count(sort=False)

TypeError: count() got an unexpected keyword argument 'sort'

Is this intentional? Is it a bug? I was trying to turn off sorting intermediate results in dask dataframe by default to speed things up a bit (it's also for cudf/RAPIDS work) and ran into this. The inconsistency makes things slightly inconvenient.

@jreback
Copy link
Contributor

jreback commented Oct 2, 2019

we accept **kwargs in these, IIRC mainly to accomodate pandas options like min_count, but these don't do anything and should be removed

In [7]: df.groupby('A').sum(sort=False, bar=True)                                                                                                                                             
Out[7]: 
   B
A   
1  3
2  3

@jreback jreback added API Design Compat pandas objects compatability with Numpy or Python functions labels Oct 2, 2019
@jreback jreback added this to the Contributions Welcome milestone Oct 2, 2019
@kkraus14
Copy link
Contributor

kkraus14 commented Oct 8, 2019

Wouldn't you pass the sort parameter to the groupby call here as opposed to the aggregation call?

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

@jbrockmendel jbrockmendel added API - Consistency Internal Consistency of API/Behavior and removed API Design labels Dec 18, 2019
@jsignell
Copy link
Contributor

Just came across this while trying to close out some old dask PRs. You can pass a sort kwarg into the groupby and it is on by default. By turning it off you can see that the sort in sum has no impact.

In [1]: import pandas as pd 
   ...:  
   ...: df = pd.DataFrame({"x": [3, 4, 1, 3, 4, 1], "y": [1, 2, 3, 4, 5, 6]}) 
   ...: df                                                                      
Out[1]: 
   x  y
0  3  1
1  4  2
2  1  3
3  3  4
4  4  5
5  1  6

In [2]: df.groupby("x", sort=False).y.sum(sort=False)                           
Out[2]: 
x
3    5
4    7
1    9
Name: y, dtype: int64

In [3]: df.groupby("x", sort=False).y.sum(sort=True)                            
Out[3]: 
x
3    5
4    7
1    9
Name: y, dtype: int64

If pandas can't catch invalid kwargs because it's passing them along, then I think this issue should be closed.

@mroeschke mroeschke added Bug and removed Compat pandas objects compatability with Numpy or Python functions labels Apr 10, 2020
@DiSchi123
Copy link

I got another issue - groupby. last doesn't accept "min_count" keyword even though the doc says so

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.resample.Resampler.last.html#pandas.core.resample.Resampler.last

Is that a bug? Happy to open a bug, just wanted to add the question here first.

@arw2019
Copy link
Member

arw2019 commented Nov 11, 2020

I'm not very familiar with that part of the codebase but since the docs say you should be able to pass the arg I'd say that's a bug.

@DiSchi123
Copy link

Ok I'll report it

@rhshadrach
Copy link
Member

groupby functions such as sum no longer take kwargs from #31473. groupby(...).last and resample(...).last now takes min_count and is functional from #37870.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug Groupby
Projects
None yet
Development

No branches or pull requests

9 participants