-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: fix reindexing MultiIndex with categorical datetime-like level #21657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix reindexing MultiIndex with categorical datetime-like level #21657
Conversation
Hello @jorisvandenbossche! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on June 29, 2018 at 09:13 Hours UTC |
pandas/core/indexes/multi.py
Outdated
# Need to box timestamps, etc. | ||
box = hasattr(lev, '_box_values') | ||
if is_categorical_dtype(lev): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need and len(lev) > len(lab)
here? I'm not sure what it's for, but the other condition had it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Below it is too switch between "first boxing, then take" and "first take, then boxing", as far as I understand "Try to minimize boxing." for performance reasons? (both should be the same).
Anyhow, I don't think it is relevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a gigantic hack, I would really not do this. Simply push to 0.23.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have ideas where to look for a better solution?
pandas/core/indexes/multi.py
Outdated
# Need to box timestamps, etc. | ||
box = hasattr(lev, '_box_values') | ||
if is_categorical_dtype(lev): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a gigantic hack, I would really not do this. Simply push to 0.23.3
Codecov Report
@@ Coverage Diff @@
## master #21657 +/- ##
==========================================
- Coverage 91.9% 91.9% -0.01%
==========================================
Files 154 154
Lines 49656 49559 -97
==========================================
- Hits 45637 45546 -91
+ Misses 4019 4013 -6
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much nicer! minor comment
vals = self._get_level_values(i) | ||
if is_categorical_dtype(vals): | ||
vals = vals.get_values() | ||
if (isinstance(vals.dtype, (PandasExtensionDtype, ExtensionDtype)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I need the 'if' here because the result of categorical.get_values()
can still be an Index with extension dtype / datetime dtype.
I would like to explore a bit more how to streamline the path from series/index/array object -> numpy array that is boxed if needed (currenlty that doesn't seem to easy, and is handled again in many different places), but that is for another PR
…andas-dev#21657) (cherry picked from commit 1cc5471)
closes #21390
This fixes the bug, but is not really a general solution. However, I would like to keep that for another PR (won't have time for this before 0.23.2), also because the underlying reason is more widely present than in just the MultiIndex.values (will open a separate issue about this -> #21658).
I added some tests of the bug at several levels where it surfaces (groupby, reindex, index.get_indexer)