BUG: fix reindexing MultiIndex with categorical datetime-like level #21657

jorisvandenbossche · 2018-06-27T15:46:20Z

This fixes the bug, but is not really a general solution. However, I would like to keep that for another PR (won't have time for this before 0.23.2), also because the underlying reason is more widely present than in just the MultiIndex.values (will open a separate issue about this -> #21658).

I added some tests of the bug at several levels where it surfaces (groupby, reindex, index.get_indexer)

pep8speaks · 2018-06-27T15:46:33Z

Hello @jorisvandenbossche! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 29, 2018 at 09:13 Hours UTC

TomAugspurger · 2018-06-27T18:25:35Z

pandas/core/indexes/multi.py

            # Need to box timestamps, etc.
            box = hasattr(lev, '_box_values')
+            if is_categorical_dtype(lev):


Do you need and len(lev) > len(lab) here? I'm not sure what it's for, but the other condition had it.

I don't think so. Below it is too switch between "first boxing, then take" and "first take, then boxing", as far as I understand "Try to minimize boxing." for performance reasons? (both should be the same).
Anyhow, I don't think it is relevant here.

this is a gigantic hack, I would really not do this. Simply push to 0.23.3

Do you have ideas where to look for a better solution?

jreback · 2018-06-28T00:20:39Z

pandas/core/indexes/multi.py

            # Need to box timestamps, etc.
            box = hasattr(lev, '_box_values')
+            if is_categorical_dtype(lev):


this is a gigantic hack, I would really not do this. Simply push to 0.23.3

codecov · 2018-06-28T10:26:34Z

Codecov Report

Merging #21657 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21657      +/-   ##
==========================================
- Coverage    91.9%    91.9%   -0.01%     
==========================================
  Files         154      154              
  Lines       49656    49559      -97     
==========================================
- Hits        45637    45546      -91     
+ Misses       4019     4013       -6

Flag	Coverage Δ
#multiple	`90.27% <100%> (-0.01%)`	⬇️
#single	`42.03% <50%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/multi.py	`94.9% <100%> (-0.06%)`	⬇️
pandas/core/arrays/base.py	`83.95% <0%> (-3.65%)`	⬇️
pandas/core/ops.py	`96.41% <0%> (-0.06%)`	⬇️
pandas/core/dtypes/dtypes.py	`95.9% <0%> (-0.05%)`	⬇️
pandas/util/testing.py	`85.19% <0%> (-0.04%)`	⬇️
pandas/core/algorithms.py	`94.83% <0%> (-0.02%)`	⬇️
pandas/core/indexes/base.py	`96.63% <0%> (-0.01%)`	⬇️
pandas/core/reshape/reshape.py	`99.78% <0%> (-0.01%)`	⬇️
pandas/core/indexes/period.py	`92.67% <0%> (ø)`	⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b63e81...2261eef. Read the comment docs.

jreback

much nicer! minor comment

jreback · 2018-06-28T10:28:25Z

pandas/core/indexes/multi.py

+            vals = self._get_level_values(i)
+            if is_categorical_dtype(vals):
+                vals = vals.get_values()
+            if (isinstance(vals.dtype, (PandasExtensionDtype, ExtensionDtype))


So I need the 'if' here because the result of categorical.get_values() can still be an Index with extension dtype / datetime dtype.

I would like to explore a bit more how to streamline the path from series/index/array object -> numpy array that is boxed if needed (currenlty that doesn't seem to easy, and is handled again in many different places), but that is for another PR

…orical-21390

…andas-dev#21657) (cherry picked from commit 1cc5471)

…21657) (cherry picked from commit 1cc5471)

…andas-dev#21657)

jorisvandenbossche added 3 commits June 27, 2018 17:20

BUG: fix reindexing MultiIndex with categorical datetime-like level

9ca42b8

also add test for reindex/get_indexer

3e46afa

add whatsnew

c867e41

jorisvandenbossche added Regression Functionality that used to work in a prior pandas version MultiIndex Categorical Categorical Data Type labels Jun 27, 2018

jorisvandenbossche added this to the 0.23.2 milestone Jun 27, 2018

fix flake8

16dc32a

jorisvandenbossche mentioned this pull request Jun 27, 2018

groupby on 2 categorical columns, when one categorical is based on datetimes, incorrectly returns all NaN dataframe #21390

Closed

TomAugspurger reviewed Jun 27, 2018

View reviewed changes

jreback requested changes Jun 28, 2018

View reviewed changes

jreback modified the milestones: 0.23.2, 0.23.3 Jun 28, 2018

possible simpler approach

819057f

jreback requested changes Jun 28, 2018

View reviewed changes

jreback modified the milestones: 0.23.3, 0.23.2 Jun 28, 2018

Merge remote-tracking branch 'upstream/master' into bug-groupby-categ…

2261eef

…orical-21390

jorisvandenbossche merged commit 1cc5471 into pandas-dev:master Jul 2, 2018

jorisvandenbossche deleted the bug-groupby-categorical-21390 branch July 2, 2018 15:26

jorisvandenbossche added Needs Backport and removed Needs Backport labels Jul 2, 2018

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request Jul 2, 2018

BUG: fix reindexing MultiIndex with categorical datetime-like level (p…

0e213e1

…andas-dev#21657) (cherry picked from commit 1cc5471)

jorisvandenbossche added a commit that referenced this pull request Jul 5, 2018

BUG: fix reindexing MultiIndex with categorical datetime-like level (#…

2fccded

…21657) (cherry picked from commit 1cc5471)

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

BUG: fix reindexing MultiIndex with categorical datetime-like level (p…

8c297ca

…andas-dev#21657)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: fix reindexing MultiIndex with categorical datetime-like level #21657

BUG: fix reindexing MultiIndex with categorical datetime-like level #21657

jorisvandenbossche commented Jun 27, 2018 •

edited

Loading

pep8speaks commented Jun 27, 2018 •

edited

Loading

TomAugspurger Jun 27, 2018

jorisvandenbossche Jun 27, 2018

jreback Jun 28, 2018

jorisvandenbossche Jun 28, 2018

jreback Jun 28, 2018

codecov bot commented Jun 28, 2018 •

edited

Loading

jreback left a comment

jreback Jun 28, 2018

jorisvandenbossche Jun 28, 2018

BUG: fix reindexing MultiIndex with categorical datetime-like level #21657

BUG: fix reindexing MultiIndex with categorical datetime-like level #21657

Conversation

jorisvandenbossche commented Jun 27, 2018 • edited Loading

pep8speaks commented Jun 27, 2018 • edited Loading

Comment last updated on June 29, 2018 at 09:13 Hours UTC

TomAugspurger Jun 27, 2018

Choose a reason for hiding this comment

jorisvandenbossche Jun 27, 2018

Choose a reason for hiding this comment

jreback Jun 28, 2018

Choose a reason for hiding this comment

jorisvandenbossche Jun 28, 2018

Choose a reason for hiding this comment

jreback Jun 28, 2018

Choose a reason for hiding this comment

codecov bot commented Jun 28, 2018 • edited Loading

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

jreback Jun 28, 2018

Choose a reason for hiding this comment

jorisvandenbossche Jun 28, 2018

Choose a reason for hiding this comment

jorisvandenbossche commented Jun 27, 2018 •

edited

Loading

pep8speaks commented Jun 27, 2018 •

edited

Loading

codecov bot commented Jun 28, 2018 •

edited

Loading