Skip to content

ENH: add Groupby.attrs namespace to access groupby attributes #53642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
topper-123 opened this issue Jun 13, 2023 · 3 comments
Open
1 of 3 tasks

ENH: add Groupby.attrs namespace to access groupby attributes #53642

topper-123 opened this issue Jun 13, 2023 · 3 comments

Comments

@topper-123
Copy link
Contributor

topper-123 commented Jun 13, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

The attributes of groupby objects are currently accessible from the groupby directly, but they are hidden, i.e. they don't show up in dir calls:

>>> df = pd.DataFrame({"a": [1, 2, 3], "b": [1, 2, 3]})
>>> dfg = df.groupby("a")
>>> dfg.keys
'a'
>>> "keys" in dir(dfg)
False
>>> dfg._hidden_attrs
frozenset({'as_index',
           'axis',
           'dropna',
           ...,
           'observed',
           'sort'})

I assume this has been done because we want the groupby attributes to be groupby methods / to not make its namespace noisy.

Feature Description

It is beneficial to be able to access the attributes and instead of using hidden attributes I propose a public/non-hidden attrs namespace, so to access an attribute, users can to e.g. dfg.attrs.keys.

This can also form the basis for a groupby repr and the groupby repr could take its data from the groupby attrs.

I'm not sure about the attrs name because we have already DataFrame.attrs, so I'm definitely open to suggestion for better names.

Alternative Solutions

The alternatives are:

1: keep things as they are / keep the attributes hidden
2. make the hidden attributes public

IMO these have disadvantages: For point 1 it is that the attributes are difficult to discover and for point 2 the disadvantage is that the groupby namespace becomes very large and groupby methods and attributes become mixed, making discoverability of groupby methods difficult.

An attrs attribute would avoid both of those disadvantages.

Additional Context

No response

@topper-123 topper-123 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 13, 2023
@jreback
Copy link
Contributor

jreback commented Jun 13, 2023

can u show what one would actually do with these?

@topper-123
Copy link
Contributor Author

can u show what one would actually do with these?

This would help introspecting groupby objects. Concretely, I often pass groupby objects through several functions, and if something unexpected happens I think it's worthwhile to be able to inspect the groupby object to understand what's happening.

@rhshadrach
Copy link
Member

rhshadrach commented Jun 28, 2023

Overall I'm +0. I personally don't use groupby objects like this (they are always created / thrown away), but I can see the benefit. If we are going this route, I certainly don't think we should allow setting them (e.g. gb.attr.keys = [1, 2, 3]).

But I'm not sure on mutability - if a user is getting gb.attr.keys or gb.attr.obj, are we returning objects that they can mutate (perhaps accidentally) the internal state of the groupby object and create perhaps hard to understand errors, or are we going to return copies. Currently users can do this via gb.obj and it doesn't seem to cause issues, but maybe users just don't know about it.

I'd be opposed to exposing _obj_with_exclusions or _selected_obj in their current state, but would be more favorable once it gets cleaned up.

@mroeschke mroeschke added Groupby and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants