Skip to content

BUG/ENH: MI.get_level_values and MI.set_levels get confused if level name happens to be int #32188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
charlesdong1991 opened this issue Feb 22, 2020 · 2 comments
Labels
MultiIndex Needs Discussion Requires discussion from core team before further action

Comments

@charlesdong1991
Copy link
Member

While working on a PR, find out a confusion when level names are integer/None, e.g.

>>> mi = pd.MultiIndex.from_frame(pd.DataFrame([[0.5,1],[0.5,0], [0.6, 0], [0.6,1]]), names=[None, 0])
>>> mi
MultiIndex([(0.5, 1),
            (0.5, 0),
            (0.6, 0),
            (0.6, 1)],
           )
>>> mi.get_level_values(0)
Int64Index([1, 0, 0, 1], dtype='int64', name=0)
>>> mi.set_levels([0.2, 0.3], level=0)
MultiIndex([(0.5, 0.3),
            (0.5, 0.2),
            (0.6, 0.2),
            (0.6, 0.3)],
           )

Pandas has quite big tolerrance of names setting, however, if names is set by int, and happens to be valid position, then using set_levels to change level value could be confusing, and also using level name to only change level values for None becomes difficult, because by default, None should be used to set values for all levels, the same happens if names is [0, 1], e.g.

>>> mi = pd.MultiIndex.from_frame(pd.DataFrame([[0.5,1],[0.5,0], [0.6, 0], [0.6,1]]), names=[1, 0])
>>> mi
MultiIndex([(0.5, 1),
            (0.5, 0),
            (0.6, 0),
            (0.6, 1)],
           names=[1, 0])
>>> mi.set_levels([0.2,0.3], level=0)
MultiIndex([(0.5, 0.3),
            (0.5, 0.2),
            (0.6, 0.2),
            (0.6, 0.3)],
           )

And yes, these are really corner cases, but might be nice to resolve to let users explicitly set/get level values! I am not sure what is the best way to figure it out, I am thinking of something like: get_level_values_by_name(0) or get_level_values_by_loc(0), or have an argument in get_level_values, e.g. get_level_values(0, by_name=True), default is False which uses loc to get values.

All feedbacks are very welcome, if there is kind of consensus, i would like to work on it!
@jreback @WillAyd @TomAugspurger @jorisvandenbossche

@charlesdong1991 charlesdong1991 added MultiIndex Needs Discussion Requires discussion from core team before further action labels Feb 23, 2020
@TomAugspurger
Copy link
Contributor

I think this is a duplicate of #21677.

@charlesdong1991
Copy link
Member Author

ahh, indeed, thanks! @TomAugspurger

i close this one since there are discussions around already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

2 participants