API: pd.Label to disambiguate e.g. MultiIndex level, or tuple as sequence vs label #42349
Labels
API Design
Enhancement
Indexing
Related to indexing on series/frames, not to indexes themselves
Needs Discussion
Requires discussion from core team before further action
The 'level' argument in MultiIndex methods can be ambiguous in at least 2 ways.
If an index's level names are
[1, 0]
, thenlevel=0
may refer to either level depending on whether it is a position or a label API: unclear what integer level name references: name or position? #21677.If an index level names are
["foo", "bar", ("foo, "bar")]
, thenlevel=("foo", "bar")
may refer to either the first two levels, or the last level Index.droplevel when names are tuples #21120One option to address the first one is adding keywords to specify e.g. level_num or level_name #10461
The new idea I'd like to discuss is having a
pd.Label
class to treat something ambiguous as a label. So in the first case, passinglevel=pd.Label(0)
would indicate that you want the second level. In the second case,level=pd.Label(("foo", "bar"))
would indicate that this should be interpreted as a single label and not a sequence of labels.Following an appropriate deprecation cycle, the default would be to interpret ambiguous cases as positional/sequence unless pd.Label is explicitly used.
pd.Label has a couple of other potential uses
A) If
frame.index
(orseries.index.levels[0]
) contains tuples, thendf.loc[a, b]
may either be selecting a single key(a, b)
from the index, or may be selectinga
from the index andb
from the columns.df.loc[pd.Label((a, b))
would disambiguateB) in
__getitem__/__setitem__
pd.Label could indicate to use .loc and not fallback to .iloc (not that useful since the user could just use .loc directly)The text was updated successfully, but these errors were encountered: