Skip to content

API: pd.Label to disambiguate e.g. MultiIndex level, or tuple as sequence vs label #42349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Jul 2, 2021 · 2 comments
Labels
API Design Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action

Comments

@jbrockmendel
Copy link
Member

The 'level' argument in MultiIndex methods can be ambiguous in at least 2 ways.

  1. If an index's level names are [1, 0], then level=0 may refer to either level depending on whether it is a position or a label API: unclear what integer level name references: name or position? #21677.

  2. If an index level names are ["foo", "bar", ("foo, "bar")], then level=("foo", "bar") may refer to either the first two levels, or the last level Index.droplevel when names are tuples #21120

One option to address the first one is adding keywords to specify e.g. level_num or level_name #10461

The new idea I'd like to discuss is having a pd.Label class to treat something ambiguous as a label. So in the first case, passing level=pd.Label(0) would indicate that you want the second level. In the second case, level=pd.Label(("foo", "bar")) would indicate that this should be interpreted as a single label and not a sequence of labels.

Following an appropriate deprecation cycle, the default would be to interpret ambiguous cases as positional/sequence unless pd.Label is explicitly used.

pd.Label has a couple of other potential uses

A) If frame.index (or series.index.levels[0]) contains tuples, then df.loc[a, b] may either be selecting a single key (a, b) from the index, or may be selecting a from the index and b from the columns. df.loc[pd.Label((a, b)) would disambiguate

B) in __getitem__/__setitem__ pd.Label could indicate to use .loc and not fallback to .iloc (not that useful since the user could just use .loc directly)

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member API Design and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 2, 2021
@attack68
Copy link
Contributor

attack68 commented Jul 2, 2021

interesting idea. How would this work? Would a tuple, whenever entered as a label always be automatically converted with to the Label? Where a tuple input was not ambiguous (or even where it might be ambiguous but at least the default) would it by synonymous to the Label and still work to index by just that tuple and not pd.Label(tuple)?

E.g. df.loc[:, [(a, b)]] still valid?

Do you think this might solve other issues?

@mroeschke mroeschke added Enhancement Needs Discussion Requires discussion from core team before further action Indexing Related to indexing on series/frames, not to indexes themselves labels Aug 21, 2021
@jbrockmendel
Copy link
Member Author

No traction, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

3 participants