Skip to content

ENH: Ability to use "normalize" function (df['col'].value_counts(normalize=True) when you do df.groupby(cols).size() #39938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bbennett36 opened this issue Feb 20, 2021 · 4 comments
Labels
Enhancement Needs Info Clarification about behavior needed to assess issue

Comments

@bbennett36
Copy link

groupby.size() should have the ability to "normalize" the results and return them as a percentage.

df['col'].value_counts(normalize=True) 
  
A = 0.25
B = 0.25  
C = 0.25  
D = 0.25

To accomplish this result with a groupby.size() you have to do the following -

df2 = df.groupby(['subset_product', 'subset_close']).size().reset_index(name='prod_count')
a = df2.groupby('subset_product')['prod_count'].transform('sum')
df2['prod_count'] = df2['prod_count'].div(a)

Is there a way to add a feature to include this as a parameter in .size()

@bbennett36 bbennett36 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 20, 2021
@rhshadrach
Copy link
Member

Please include a full reproducible example.

df.groupby(...).size() / df.shape[0] gives you the normalized counts.

@rhshadrach rhshadrach added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 20, 2021
@MarcoGorelli MarcoGorelli added the Closing Candidate May be closeable, needs more eyeballs label Feb 21, 2021
@MarcoGorelli
Copy link
Member

Hi @bbennett36 - does df.groupby(...).size() / df.shape[0] work for you? If so, I think this can be closed, I don't think it's worth adding an extra parameter in size for this

@bbennett36
Copy link
Author

@MarcoGorelli That gives the percentage of the entire dataframe. I was looking for a percentage within the group.

I'll try to get a reproducible example on here today.

@MarcoGorelli MarcoGorelli removed the Closing Candidate May be closeable, needs more eyeballs label Feb 22, 2021
@MarcoGorelli
Copy link
Member

Hi @bbennett36 - closing for now to reduce the queue, will reopen if you post a reproducible example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

3 participants