-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Support for Enums in MultiIndex #21298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This might be a hack, but what if, in the following snippet from pandas/pandas/core/arrays/categorical.py Lines 2526 to 2528 in 9f95f7d
the passed parameter ordered is changed to False ? Do any other callers need _factorize_from_iterable ? Please advise. Thank you.
|
you can try but that will likely break lots of code |
The functionality seems like it would go essentially unchanged, other than the not raising of a pandas/pandas/core/arrays/categorical.py Lines 352 to 354 in 9f95f7d
The question is what could go wrong for other callers if the TypeError is not raised, I suppose.
|
@benediamond : I think your best bet is to try hacking your way through the code, adding support, and see what test cases break, if any. Can't argue (as much) with passing tests. 😄 |
@gfyoung heh alright, i'll give it a shot! |
@benediamond you can also use Enums as a pd.MultiIndex if you overwrite lt |
Code Sample, a copy-pastable example if possible
Problem description
Though Enums can easily be used as column indexers, strange errors appear to arise when they are used (as one of the factors) in a MultiIndex.
The multiindex (and dataframe) can be created successfully if an (ordered) categorical Series is passed to the constructor. Yet in this case, appending rows in the usual way fails. One can create new rows using
.loc
, and yet this is not as nice.This whole situation can be avoided by using strings instead of an Enum. Alternatively, one can use an IntEnum---and yet this essentially uses the underlying integers, instead of the names, as the column indexers.
As the use of enums as columns is perfectly supported in the case of a simple index, it seems a shortcoming that they can't be used in a MultiIndex.
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: