-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: tighten _should_compare for MultiIndex #42231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jbrockmendel
commented
Jun 25, 2021
- closes #xxxx
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
@@ -5289,6 +5289,16 @@ def _get_indexer_non_comparable( | |||
""" | |||
if method is not None: | |||
other = unpack_nested_dtype(target) | |||
if self._is_multi ^ other._is_multi: | |||
kind = other.dtype.type if self._is_multi else self.dtype.type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do tests hit this? e.g. as you didn't change anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the affected cases was not tested; just added a test for that case.
is there a specific benchmark that this improves? |
No. The motivation is yak-shaving that traces back to getting rid of _convert_list_indexer |
@jreback gentle ping (a whole mess of MultiIndex PRs yak-shaving inconsistencies) |
@jbrockmendel This PR seems to have introduced a bug. I verified this with the code below and a bisect.
now returns
Before this patch is did a broadcast assignment to the remaining MultiIndex levels, i.e..
|
xref #40186 |
# other contains only tuples so unless we are object-dtype, | ||
# there can never be any matches | ||
return self._is_comparable_dtype(dtype) | ||
return self.nlevels == other.nlevels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the change that is breaking MultiIndex broadcasting. If one has 3 levels and the other has 2, then this is False. Previously these were comparable and so would be compared and expanded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. do you know what the calling method is in the problematic case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Walking back, previous is
pandas/pandas/core/indexes/base.py
Line 3481 in ddd90b0
if not self._should_compare(target) and not self._should_partial_index(target): |
then
pandas/pandas/core/indexes/base.py
Line 3887 in ddd90b0
indexer = self.get_indexer( |
Here self
is the Series with 2 levels and other is the DataFrame with 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big change is driven by the return difference of self._should_compare(target)
. Before this patch it returned True
, so the if not ...
block was skipped. It now returns False
, and so it incorrectly shortcuts and fills with an NA value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, i think ive got a handle on whats going on here. The long-term fix will be in MultiIndex.get_indexer, but for now this should just be reverted.
This reverts commit 381dd06.
…" (pandas-dev#42575) This reverts commit 381dd06.
…" (pandas-dev#42575) This reverts commit 381dd06.