-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: way to exclude the grouped column with apply #7155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@rhshadrach some of what you're working on with selected_obj vs obj_with_exclusions might be related? |
Throughout most of groupby, the grouping column is not included when computing the result. I think we should always excluding the groupings, and plan to put up a deprecation of this as part of 2.x. |
The example in OP doesn't work any more (even adjusted for the In [14]: df = pd.DataFrame.from_dict({'b1':['aa','ac','ac','ad'],\
...: 'b2':['bb','bc','ad','cd'],\
...: 'b3':['cc','cd','cc','ae'],\
...: 'c' :['a','b','a','b']})
...:
...: df.groupby('c')[df.columns.drop('c')].apply(lambda x: x.unstack().value_counts())
TypeError: Series.name must be a hashable type This must be a bug? For reference, I got it to work like this: In [15]: df[df.columns.drop('c')].groupby(df['c']).apply(lambda x: x.unstack().value_counts())
Out[15]:
c
a cc 2
aa 1
ac 1
bb 1
ad 1
b cd 2
ac 1
ad 1
bc 1
ae 1
dtype: int64 Which is not a nice API and the first example should work IMO. |
I think it look reasonable to deprecate keeping the grouping keys in the groupby.apply to keep the method similar to the other groupby methods. I'd say in 99,5 % if the time, that is what users would want/expect. Keeping the deprecation process reasonable can be tricky (avoid warnings all over is a concern). How would we deprecate this? |
I have a branch to deprecate that I think is all set to go. It looks quite noisy if you look at our tests, but would only warn the user if:
To avoid the warning, users can exclude the groupings, e.g.
|
Reopening since #52477 was reverted. |
xref: #52477 (comment) As the deprecation was determined to be noisy, I plan to reintroduce the linked PR once 2.1 is released with an option ( @phofl - does this sound reasonable? |
This removes a warning raised by pandas when performing a groupby-apply call on a dataframe in noslag.aggregate. This warning is new in pandas 2.2. See also here: pandas-dev/pandas#7155
I feel like I must be misunderstanding the fix for this issue because I am now running into the opposite of this issue. I am wanting to use the groupby functionality to be able to apply functions to specific groups of data and return the entire dataframe. The current functionality provides that but sounds like it is being deprecated in pandas 3.0. Is there going to be a way to include the groupby column without explicitly subsetting all of the columns? |
@madelavar12 - could you open a new issue and provide an example of the computation you wish to perform. |
http://stackoverflow.com/questions/23709811/best-way-to-sum-group-value-counts-in-pandas/23712433?noredirect=1#comment36441762_23712433
any of these look palatable?
df.groupby('c',as_index='exclude').apply(....)
df.groupby('c')['~c'].apply(...)
df.groupby('c',filter='c')
df.groupby('c',filter=True)
rather than doing a negated selection (
df.columns-['c']
)The text was updated successfully, but these errors were encountered: