Skip to content

BUG: MultiIndex set_levels is not symmetrical with get_level_values #14258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danio opened this issue Sep 20, 2016 · 6 comments
Closed

BUG: MultiIndex set_levels is not symmetrical with get_level_values #14258

danio opened this issue Sep 20, 2016 · 6 comments
Labels
Bug Duplicate Report Duplicate issue or pull request MultiIndex

Comments

@danio
Copy link

danio commented Sep 20, 2016

Code Sample

import pandas as pd
import io

text = """
A B C
1 1 1
1 2 2
2 1 3
2 2 4
"""
df = pd.read_csv(io.StringIO(text), delimiter = ' ')
# Note the output of get_level_values is the list of all values [1, 1, 2, 2]
print(df.index.get_level_values('A').tolist())
df.set_index(['A','B'], inplace = True)
df.index.set_levels(df.index.get_level_values('A').map(lambda x: x * 2), level='A', inplace=True)
print(df.index.get_level_values('A').tolist())
# outputs [2, 2, 2, 2]

Expected Output

[2, 2, 4, 4]

# If I re-order the input as:
text = """
A B C
1 1 1
2 1 2
1 2 3
2 2 4
"""
# it works as I expected giving [2, 4, 2, 4]

Is this a bug or expected behaviour?

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: 0.7.5.None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

@jreback jreback added Bug Duplicate Report Duplicate issue or pull request MultiIndex labels Sep 20, 2016
@jreback jreback modified the milestones: Next Major Release, No action Sep 20, 2016
@jreback
Copy link
Contributor

jreback commented Sep 20, 2016

duplicate of #13754 and PR in #14236

@jreback jreback closed this as completed Sep 20, 2016
@danio
Copy link
Author

danio commented Sep 20, 2016

@jreback I don't think from the description of #13754 and the code changes in #14236 that it is an exact duplicate of those. I will try testing it with the suggested change and see if that changes the behaviour.

@danio
Copy link
Author

danio commented Sep 21, 2016

@jreback I have confirmed it is not a duplicate. #13754 is about set_levels still changing the index even when verification fails. This issue is not about that.

I have realised that set_levels and get_level_values should not necessarily be symmetrical. What I think is missing from pandas is a get_levels function that would be symmetrical to get_levels (the levels property is not the same as you cannot use labels).

Failing that, I have found that the workaround for this is to replace

df.index.get_level_values(level)

with

df.index.levels[df.index.names.index(level)]

in my code, i.e.

level_index = df.index.names.index('A')
df.index.set_levels(df.index.levels[level_index].map(lambda x: x * 2), level='A', inplace=True)

@bkandel
Copy link
Contributor

bkandel commented Sep 21, 2016

@danio I agree that this issue is not a duplicate of #13754, but I think that it's quite close to this description: #13741 (comment) Basically the issue is that there's no way to directly set the equivalent of the column names in a multiindex. I.e. where you would be used to saying df.columns = ['A', 'B', 'C'] or df.index = ['A', 'B', 'C'], you can't say df.set_columns(['A', 'B', 'C'], level=0).

@danio
Copy link
Author

danio commented Sep 22, 2016

@bkandel, yes, the example there is quite convoluted but I think this is caused by the same underlying issue as #13741. I can't see any options to change the duplicate setting of this issue, probably I don't have the permissions?

It feels to me that MutilIndex.set_levels is exposing too much of the underlying representation of the index, and it should be replaced by a new function with an interface more like get_level_values. That's probably not a discussion for the issue tracker though.

@bkandel-picwell
Copy link
Contributor

@danio Yes, I agree that exposing set_levels and set_labels but not set_level_values (like what get_level_values does) makes it harder for users to see the functionality they want. I think that should be raised as a separate issue. I'll spend a bit of time to make sure exactly what functionality is needed and then file an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request MultiIndex
Projects
None yet
Development

No branches or pull requests

4 participants