-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Support for use of Enums in MultiIndex #21348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 15 commits
0ec3531
ca40575
3857229
9aee04a
60c8f2a
13d67ca
e6e6627
d0607f4
ec5900a
979b61e
1c84176
e914b20
5a55fee
191fe58
bac47e2
f76d522
f125bf7
761a9d1
2e7921b
21014c6
4cd2dff
702594b
1950562
6e4485f
06437cd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,3 +27,4 @@ dependencies: | |
- pytest | ||
- pytest-xdist | ||
- moto | ||
- enum34 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,7 @@ dependencies: | |
# universal | ||
- pytest | ||
- pytest-xdist | ||
- enum34 | ||
- pip: | ||
- html5lib==1.0b2 | ||
- beautifulsoup4==4.2.1 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,6 +42,7 @@ dependencies: | |
- pytest | ||
- pytest-xdist | ||
- moto | ||
- enum34 | ||
- pip: | ||
- backports.lzma | ||
- cpplint | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -88,6 +88,7 @@ Indexing | |
- Bug in :meth:`MultiIndex.set_names` where error raised for a ``MultiIndex`` with ``nlevels == 1`` (:issue:`21149`) | ||
- Bug in :class:`IntervalIndex` constructors where creating an ``IntervalIndex`` from categorical data was not fully supported (:issue:`21243`, issue:`21253`) | ||
- Bug in :meth:`MultiIndex.sort_index` which was not guaranteed to sort correctly with ``level=1``; this was also causing data misalignment in particular :meth:`DataFrame.stack` operations (:issue:`20994`, :issue:`20945`, :issue:`21052`) | ||
- Bug in :func:`pandas.core.arrays.categorical._factorize_from_iterable` inappropriately caused ``TypeError`` to be raised when an ``Enum`` was used as a factor in a ``MultiIndex`` (:issue:`21298`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@benediamond : I would actually move your There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've gone ahead and just removed it---I'm not sure where it belongs, as it appeared that this PR (when it was passing) was slated for the next milestone. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @benediamond : I would put it in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gfyoung Done. this one didn't create merge conflicts, thankfully |
||
- | ||
|
||
I/O | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2523,7 +2523,7 @@ def _factorize_from_iterable(values): | |
ordered=values.ordered) | ||
codes = values.codes | ||
else: | ||
cat = Categorical(values, ordered=True) | ||
cat = Categorical(values, ordered=False) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting that this didn't cause any testing issues... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it's striking. It's possible that that |
||
categories = cat.categories | ||
codes = cat.codes | ||
return codes, categories | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3307,3 +3307,39 @@ def test_duplicate_multiindex_labels(self): | |
with pytest.raises(ValueError): | ||
ind.set_levels([['A', 'B', 'A', 'A', 'B'], [2, 1, 3, -2, 5]], | ||
inplace=True) | ||
|
||
def test_use_enum_in_multiindex(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
# GH 21298 | ||
# Allow use of Enums as one of the factors in a MultiIndex. | ||
from enum import Enum | ||
from pandas.core.algorithms import take_1d | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By "top", @jreback meant move the imports to "top of the file" 😄 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hah whoops. done |
||
|
||
# See https://github.com/pandas-dev/pandas/blob/ | ||
# 0c65c57a279e755ab7093db925d1e580f9878dae/pandas/util/testing.py#L793-L799. | ||
# We cannot simply use tm.assert_index_equal, as the "expected" index | ||
# cannot actually be built yet. | ||
def _get_ilevel_values(index, level): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is all this for? simply construct the expected index directly There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The thought here is that the expected index cannot be constructed directly, due to the exact bug I'm trying to fix. Thus instead I create an expected "level", and verify that it matches the relevant level of |
||
# accept level number only | ||
unique = index.levels[level] | ||
labels = index.labels[level] | ||
filled = take_1d( | ||
unique.values, | ||
labels, fill_value=unique._na_value | ||
) | ||
values = unique._shallow_copy(filled, name=index.names[level]) | ||
return values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given that you use this function only once, I think this abstraction is unnecessary as @jreback alluded to in an earlier comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, amended. Thanks for your guidance. |
||
|
||
MyEnum = Enum("MyEnum", "A B") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you would have to add
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, I've hopefully done this correctly. |
||
df = pd.DataFrame(columns=pd.MultiIndex.from_product(iterables=[ | ||
MyEnum, | ||
[1, 2] | ||
])) | ||
exp_index_0 = pd.Index([MyEnum.A, MyEnum.A, MyEnum.B, MyEnum.B], | ||
dtype='object') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is pretty complicated, why cannot you just directly construct the Index here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
tm.assert_index_equal(_get_ilevel_values(df.columns, 0), exp_index_0) | ||
|
||
expected = df.copy() | ||
df = df.append({(MyEnum.A, 1): "abc", (MyEnum.B, 2): "xyz"}, | ||
ignore_index=True) | ||
expected.loc[0, [(MyEnum.A, 1), (MyEnum.B, 2)]] = 'abc', 'xyz' | ||
tm.assert_frame_equal(df, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove from all but 1 of the 27 compat tests (the travis-27) one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done