Skip to content

Support for Enums in MultiIndex #21298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
benediamond opened this issue Jun 2, 2018 · 7 comments
Closed

Support for Enums in MultiIndex #21298

benediamond opened this issue Jun 2, 2018 · 7 comments

Comments

@benediamond
Copy link

benediamond commented Jun 2, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd
from enum import Enum

MyEnum = Enum("MyEnum", "A B")

df = pd.DataFrame(columns=pd.MultiIndex.from_product(iterables=[MyEnum, [1, 2]]))  # TypeError: 'values' is not ordered, please explicitly specify the categories order by passing in a categories argument.

df = pd.DataFrame(columns=pd.MultiIndex.from_product(iterables=[pd.Series(MyEnum, dtype="category"), [1, 2]]))  # this workaround successfully executes, but...
df.append({(MyEnum.A, 1): "abc", (MyEnum.B, 2): "xyz"}, ignore_index=True)  # ... this "append" statement then raises the same error.

df.loc[0, [(MyEnum.A, 1), (MyEnum.B, 2)]] = 'abc', 'xyz'  # this works, but is less desirable (can't pass a dict, need to come up with a row indexer, etc.)

Problem description

Though Enums can easily be used as column indexers, strange errors appear to arise when they are used (as one of the factors) in a MultiIndex.

The multiindex (and dataframe) can be created successfully if an (ordered) categorical Series is passed to the constructor. Yet in this case, appending rows in the usual way fails. One can create new rows using .loc, and yet this is not as nice.

This whole situation can be avoided by using strings instead of an Enum. Alternatively, one can use an IntEnum---and yet this essentially uses the underlying integers, instead of the names, as the column indexers.

As the use of enums as columns is perfectly supported in the case of a simple index, it seems a shortcoming that they can't be used in a MultiIndex.

Expected Output

>>> df = pd.DataFrame(columns=pd.MultiIndex.from_product(iterables=[MyEnum, [1, 2]]))
>>> df
Empty DataFrame
Columns: [(MyEnum.A, 1), (MyEnum.A, 2), (MyEnum.B, 1), (MyEnum.B, 2)]
Index: []
>>> df.append({(MyEnum.A, 1): "abc", (MyEnum.B, 2): "xyz"}, ignore_index=True)
  MyEnum.A      MyEnum.B     
         1    2        1    2
0      abc  NaN      NaN  xyz

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@benediamond benediamond changed the title MultiIndex containing Enums raises "'values' is not ordered" Support for Enums in MultiIndex Jun 4, 2018
@benediamond
Copy link
Author

This might be a hack, but what if, in the following snippet from _factorize_from_iterable

cat = Categorical(values, ordered=True)
categories = cat.categories
codes = cat.codes

the passed parameter ordered is changed to False? Do any other callers need _factorize_from_iterable? Please advise. Thank you.

@jreback
Copy link
Contributor

jreback commented Jun 4, 2018

you can try but that will likely break lots of code

@benediamond
Copy link
Author

benediamond commented Jun 4, 2018

The functionality seems like it would go essentially unchanged, other than the not raising of a TypeError here?

raise TypeError("'values' is not ordered, please "
"explicitly specify the categories order "
"by passing in a categories argument.")

The question is what could go wrong for other callers if the TypeError is not raised, I suppose.

@gfyoung
Copy link
Member

gfyoung commented Jun 6, 2018

@benediamond : I think your best bet is to try hacking your way through the code, adding support, and see what test cases break, if any. Can't argue (as much) with passing tests. 😄

@benediamond
Copy link
Author

@gfyoung heh alright, i'll give it a shot!

@jreback jreback added this to the Next Major Release milestone Jun 7, 2018
@jackalack
Copy link

@benediamond you can also use Enums as a pd.MultiIndex if you overwrite lt

@WillAyd
Copy link
Member

WillAyd commented Nov 26, 2018

closed via #22072 and #15457

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants