-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
groupby weird behavior #51692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The inconsistency has been fixed, if you try with the 2.0.0 release candidate you'll get matching results In [2]: df.groupby("Animal").head()
Out[2]:
Animal Max Speed
0 Parrot 24.0
1 Falcon 380.0
2 Parrot 26.0
3 Falcon 370.0
In [3]: df.groupby("Animal").nth[:]
Out[3]:
Animal Max Speed
0 Parrot 24.0
1 Falcon 380.0
2 Parrot 26.0
3 Falcon 370.0 |
Can you provide a reproducible example in your issue? There is a template that you can fill out. The examples from your notebook are generally good, but please include them here |
Good point, this is explicitly documented as well https://pandas.pydata.org/docs/dev/reference/api/pandas.core.groupby.DataFrameGroupBy.head.html |
@MarcoGorelli
|
Is the issue just the sorting? I don't know, but cc'ing the expert @rhshadrach for comments (and reopening for now) |
I can also provide another example, that is a bit complex. But in documentation, the behavior is not recognisable, because the dataframe is quite tidy. import pandas as pd
df = pd.DataFrame({'Animal': ['Parrot', 'Falcon',
'Parrot', 'Falcon'],
'Max Speed': [24., 380., 26., 370.]})
print(df)
print()
print(df.groupby("Animal", group_keys=True).apply(lambda x: x))
print()
print(df.groupby("Animal", group_keys=True, sort=False).apply(lambda x: x))
print()
print(df.groupby("Animal", group_keys=False, sort=True).apply(lambda x: x)) # doesn't work
print()
print(df.groupby("Animal", group_keys=False, sort=False).apply(lambda x: x)) # doesn't work ( here is my notebook: https://gist.github.com/jablka/31f7eee7dd8635f73b3037c0bd466469 ) But this example seems more complex, since the code has more parameters than just |
Filters, which includes head, tail, and nth, only subset the original DataFrame. In particular, In the examples with apply, with |
so please, having this dataframe: df = pd.DataFrame({'Animal': ['Parrot', 'Falcon', 'Sparrow',
'Parrot', 'Falcon', 'Sparrow'],
'Max Speed': [24., 380., None,
26., 370., None ]})
'''
Animal Max Speed
0 Parrot 24.0
1 Falcon 380.0
2 Sparrow NaN
3 Parrot 26.0
4 Falcon 370.0
5 Sparrow NaN
''' how to achieve, to be grouped to this desired output:
|
Here's two rather ugly solutions to your problem: >>> tdf = df.set_index("Animal", append=True).reorder_levels([1,0])
>>> (tdf.loc[pd.concat([i.to_frame()
for i in tdf.groupby("Animal", sort=False).groups.values()]).index]
.reset_index("Animal")
)
Animal Max Speed
0 Parrot 24.0
3 Parrot 26.0
1 Falcon 380.0
4 Falcon 370.0
2 Sparrow NaN
5 Sparrow NaN OR >>> (df.assign(ord=lambda df: df["Animal"]
.map({x:i for (i,x) in enumerate(df["Animal"].unique())}))
.rename_axis(index="ind")
.reset_index()
.sort_values(["ord", "ind"])
.drop(columns=["ind","ord"])
)
Animal Max Speed
0 Parrot 24.0
3 Parrot 26.0
1 Falcon 380.0
4 Falcon 370.0
2 Sparrow NaN
5 Sparrow NaN |
If you'd like to maintain the order throughout an algorithm, I'd recommend considering using an ordered categorical:
If you don't want to use categorical, you can use the
Instead of a named function, you may prefer to use a lambda. |
thank you for the solutions. df.groupby('Animal').filter(lambda x:True) (which was proposed in StackOverflow... but seems it doesn't work https://stackoverflow.com/questions/36069373/pandas-groupby-dataframe-store-as-dataframe-without-aggregating/36069578#36069578 ) or the example from documentation: df.groupby('Animal').apply(lambda x : x) If it's not a bug, then perhaps a feature request? |
Certainly! But I don't think it's a good user experience to point users toward groupby to accomplish a sort, and I don't think we should be introducing groupby ops that do the same thing yet differ from how other agg / transform / filter methods behave. |
Should we just treat this as a documentation issue, document this, and close it? |
@MarcoGorelli - agreed; I believe as_index is well-documented but sort is not. |
I'd like to work on it. |
#51704 is adding that sort has no impact on filters to the User Guide; the only other place I know offhand where this should be mentioned is the API docs for DataFrame.groupby and Series.groupby. |
follow-up to my code puzzle: df.sort_values(by='Animal', key=lambda x: x.factorize()[0] ) 👏 :-)
|
I experience a weird pandas groupby behavior.
Sometimes the groupby doesn't work, it just outputs the original dataframe.
See my gist ( .ipynb file ) here:
https://gist.github.com/jablka/d1b6461692e5c4a05727efaa85d86bbd
Anybody knows what's wrong? It looks so trivial, but I got stuck...
The text was updated successfully, but these errors were encountered: