PERF: tighten _should_compare for MultiIndex #42231

jbrockmendel · 2021-06-25T17:08:41Z

closes #xxxx
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

jreback · 2021-06-25T22:04:14Z

pandas/core/indexes/base.py

@@ -5289,6 +5289,16 @@ def _get_indexer_non_comparable(
        """
        if method is not None:
            other = unpack_nested_dtype(target)
+            if self._is_multi ^ other._is_multi:
+                kind = other.dtype.type if self._is_multi else self.dtype.type


do tests hit this? e.g. as you didn't change anything

One of the affected cases was not tested; just added a test for that case.

jreback · 2021-06-25T22:04:30Z

is there a specific benchmark that this improves?

jbrockmendel · 2021-06-26T00:47:46Z

is there a specific benchmark that this improves?

No. The motivation is yak-shaving that traces back to getting rid of _convert_list_indexer

jbrockmendel · 2021-07-01T16:15:22Z

@jreback gentle ping (a whole mess of MultiIndex PRs yak-shaving inconsistencies)

bashtage · 2021-07-16T00:11:14Z

@jbrockmendel This PR seems to have introduced a bug. I verified this with the code below and a bisect.

import pandas as pd
mi = pd.MultiIndex.from_product([["a","b","c"],[1,2,3],["z","y","x"]])
df = pd.DataFrame(index=mi,dtype=float)
mi2 = pd.MultiIndex.from_product([["a","b","c"],[1,2,3]])
s = pd.Series(index=mi2, dtype=float)
s.iloc[:]=3.14
df["new"] = s
print(df)

now returns

        new
a 1 z  NaN
    y  NaN
    x  NaN
  2 z  NaN
    y  NaN
    x  NaN
  3 z  NaN
    y  NaN
    x  NaN

Before this patch is did a broadcast assignment to the remaining MultiIndex levels, i.e..

        new
a 1 z  3.14
    y  3.14
    x  3.14
  2 z  3.14
    y  3.14
    x  3.14
  3 z  3.14
    y  3.14
    x  3.14

bashtage · 2021-07-16T00:11:51Z

xref #40186

bashtage · 2021-07-16T00:45:53Z

pandas/core/indexes/base.py

+                # other contains only tuples so unless we are object-dtype,
+                #  there can never be any matches
+                return self._is_comparable_dtype(dtype)
+            return self.nlevels == other.nlevels


This is the change that is breaking MultiIndex broadcasting. If one has 3 levels and the other has 2, then this is False. Previously these were comparable and so would be compared and expanded.

thanks. do you know what the calling method is in the problematic case?

Walking back, previous is

pandas/pandas/core/indexes/base.py

Line 3481 in ddd90b0

if not self._should_compare(target) and not self._should_partial_index(target):

then

pandas/pandas/core/indexes/base.py

Line 3887 in ddd90b0

indexer = self.get_indexer(

Here self is the Series with 2 levels and other is the DataFrame with 3.

The big change is driven by the return difference of self._should_compare(target). Before this patch it returned True, so the if not ... block was skipped. It now returns False, and so it incorrectly shortcuts and fills with an NA value.

OK, i think ive got a handle on whats going on here. The long-term fix will be in MultiIndex.get_indexer, but for now this should just be reverted.

This reverts commit 381dd06.

…" (pandas-dev#42575) This reverts commit 381dd06.

PERF: tighten _should_compare for MultiIndex

37b4156

jreback reviewed Jun 25, 2021

View reviewed changes

jreback added the Performance Memory or execution speed performance label Jun 25, 2021

jbrockmendel added 2 commits June 25, 2021 15:18

Merge branch 'master' into bug-mi-should_compare

d627ea5

Test for get_indexer with method and mixed-nlevels

795b700

jbrockmendel added the MultiIndex label Jun 30, 2021

jreback added this to the 1.4 milestone Jul 1, 2021

jreback merged commit 381dd06 into pandas-dev:master Jul 1, 2021

jbrockmendel deleted the bug-mi-should_compare branch July 1, 2021 23:08

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

PERF: tighten _should_compare for MultiIndex (pandas-dev#42231)

b31da6d

bashtage mentioned this pull request Jul 16, 2021

BUG: MultiIndex assignment fails to broadcast omitted levels #42557

Closed

2 tasks

bashtage reviewed Jul 16, 2021

View reviewed changes

jbrockmendel added a commit that referenced this pull request Jul 16, 2021

Revert "PERF: tighten _should_compare for MultiIndex (#42231)"

8e3eaf7

This reverts commit 381dd06.

jbrockmendel mentioned this pull request Jul 16, 2021

Revert "PERF: tighten _should_compare for MultiIndex" #42575

Merged

jreback pushed a commit that referenced this pull request Jul 25, 2021

Revert "PERF: tighten _should_compare for MultiIndex (#42231)" (#42575)

4c9ef1b

This reverts commit 381dd06.

CGe0516 pushed a commit to CGe0516/pandas that referenced this pull request Jul 29, 2021

Revert "PERF: tighten _should_compare for MultiIndex (pandas-dev#42231)…

8756cfa

…" (pandas-dev#42575) This reverts commit 381dd06.

feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021

Revert "PERF: tighten _should_compare for MultiIndex (pandas-dev#42231)…

198df89

…" (pandas-dev#42575) This reverts commit 381dd06.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: tighten _should_compare for MultiIndex #42231

PERF: tighten _should_compare for MultiIndex #42231

jbrockmendel commented Jun 25, 2021

jreback Jun 25, 2021

jbrockmendel Jun 26, 2021

jreback commented Jun 25, 2021

jbrockmendel commented Jun 26, 2021

jbrockmendel commented Jul 1, 2021

bashtage commented Jul 16, 2021

bashtage commented Jul 16, 2021

bashtage Jul 16, 2021

jbrockmendel Jul 16, 2021

bashtage Jul 16, 2021

bashtage Jul 16, 2021

jbrockmendel Jul 16, 2021

PERF: tighten _should_compare for MultiIndex #42231

PERF: tighten _should_compare for MultiIndex #42231

Conversation

jbrockmendel commented Jun 25, 2021

jreback Jun 25, 2021

Choose a reason for hiding this comment

jbrockmendel Jun 26, 2021

Choose a reason for hiding this comment

jreback commented Jun 25, 2021

jbrockmendel commented Jun 26, 2021

jbrockmendel commented Jul 1, 2021

bashtage commented Jul 16, 2021

bashtage commented Jul 16, 2021

bashtage Jul 16, 2021

Choose a reason for hiding this comment

jbrockmendel Jul 16, 2021

Choose a reason for hiding this comment

bashtage Jul 16, 2021

Choose a reason for hiding this comment

bashtage Jul 16, 2021

Choose a reason for hiding this comment

jbrockmendel Jul 16, 2021

Choose a reason for hiding this comment