Skip to content

DOC: df.groupby('A') is just syntactic sugar for df.groupby(df['A']) #51063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
Sirmadeira opened this issue Jan 30, 2023 · 6 comments
Closed
1 task done
Labels
Docs Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@Sirmadeira
Copy link

Sirmadeira commented Jan 30, 2023

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/user_guide/groupby.html

Especifically on the line

df.groupby('A') is just syntactic sugar for df.groupby(df['A']).

A list of any of the above things.

Documentation problem

Well here is a sample on how is not just syntactic sugar. I think

test_df = pd.DataFrame({'Category': {0: 'product-availability address-confirmation input',
  1: 'registration register-data-confirmation options',
  2: 'onboarding return-start input',
  3: 'registration register-data-confirmation input',
  4: 'decision-tree first-interaction-validation options'},
 'Original_UserId': {0: '[email protected]',
  1: '[email protected]',
  2: '[email protected]',
  3: '[email protected]',
  4: '[email protected]'}})

If I run
test_df['Category'].eq('onboarding return-start input').groupby(test_df['Original_UserId']).cummax()

This gives a result

If I run

test_df['Category'].eq('onboarding return-start input').groupby('Original_UserId').cummax()
I get keyerror

I am guessing the keyerror is because of the checking that occurs on the given object, that being whether the object contains that given column or not.

Suggested fix for documentation

I am not sure, maybe just add that the difference is that one checks whether one contains the given object series and the other does not.

@Sirmadeira Sirmadeira added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 30, 2023
@phofl
Copy link
Member

phofl commented Jan 30, 2023

cc @jorisvandenbossche

Did we figure this out completely for CoW? Otherwise would just remove this line

@rhshadrach
Copy link
Member

@Sirmadeira

df.groupby('A') is just syntactic sugar for df.groupby(df['A']).

In the statement, the df inside and outside the groupby must be the same object; this is violated in the example you provided.

@Sirmadeira
Copy link
Author

True they are different objects.

@rhshadrach
Copy link
Member

@phofl - you good with this closed?

@phofl
Copy link
Member

phofl commented Jan 31, 2023

yeah the CoW topic is already known and tracked, so no need to keep open here

@jorisvandenbossche
Copy link
Member

Did we figure this out completely for CoW? Otherwise would just remove this line

For future reference, the PR for CoW on that is further looking at this is #50730 (which has the references to the original discussion).

The bottom line is that the "syntactic sugar" is a bit more costly to ensure for CoW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

4 participants