-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Consistency with groupby as_index #5755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The docstring says:
So the contract on the index depends on the definition of what constitutes "aggregated output". But that's a pretty strange design choice, I'd expect the result of a groupby to be a By either definition though, nth seems baroquely broken. no idea. Since t's undocumented This looks like >=3 seperate issues to me. |
Just to link #5552 on nth groupby. Not sure what a good way to fix/break the current behaviour is (for 0.14)... |
bumping to 0.15 / this is a more complete #7002 so going to add my example at the top |
a couple of PRs doing atm tick off some of this as well as previous (checked) stuff I think I'll tweak the top description later to reflect the change (this was not changed in 0.13.1) so maybe will see some people confused here |
gr8! |
as an aside, I think we need to make clear, maybe in the
as compared to yes? |
I added a bullet point at the end of here: #6944 |
Maybe more appropriate here than in #7000, so repeating: Looking at this discussion with a lot of Aaarghs (and also seeing some of the other issues with groupby), I was thinking: should we try to write down some 'design document' where we try to describe the 'rules'. Eg the "we regard This could maybe clarify some things for ourselves, and can be used as a reference for future PRs. (Or could it also turn out as a rabbit hole ..) And I mean a rather detailed description for devs (and maybe simpler version in the docs, as added now to #6944) |
just going to do that, I think a nice Note at the beginning of the aggregation/transform/filter secition should do it? clear, reminding you that these functions return or don't return the grouped indices |
@hayd alright I added the cumsum groupby consisncy here; will see if we get to before release |
Parking this here, because I think (?) it's the same issue.
|
@jseabold Is that: |
Yeah. It's an aggregation. |
@jorisvandenbossche should update this, and include #13519 |
@jreback Yes, this is one of the topics I planned to tackle |
I wrote out a separate issue for |
Edit: This change was implemented in #34012 Currently nunique with as_index=False reports the number of unique elements for the grouping column, which is tautologically 1:
df:
result:
It seems to me it'd be better to leave the grouping column as-is:
Is there a reason the current behavior is a feature that we don't want to change? |
As far as I can tell, all issues identified here have been solved. I think this issue can be closed. |
Checklist:
as_index=False
, BUG: Applying function on column of Groupby object with as_index=False does not select column #5764as_index
)I've had this half-written for a while, but I wasn't sure actually explain it. I'm still not sure, but here goes.
Would be intersting to discuss what peoples views on the correctness of these was.
I think there's scope for cleaning up some possible inconsistencies (unfortunately this would mean breaking the API):
My opinion is that head/tail/nth should act like filter (i.e. both ignore the as_index).
The text was updated successfully, but these errors were encountered: