-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: implement _should_compare #38105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/indexes/base.py
Outdated
return no_matches | ||
else: | ||
# This is for get_indexer_non_unique | ||
return no_matches, no_matches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback theres some ambiguity as to what we're supposed to be returning here. ATM in Index.get_indexer_non_unique we are returning two ndarrays of minus-ones, but in IntervalIndex the second one is an np.arange(len(target))
The other inconsistency is that ATM PeriodIndex.get_indexer defines no_matches = -1 * np.ones(self.shape, dtype=np.intp)
whereas i expected it to use target.shape
Can you clear this up for me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at this more, i increasingly think the second one should be np.arange(target.shape, dtype=np.intp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think this method should raise.
this is a pretty big change from status quo
I stumbled onto this in part because of this test:
and i figured that the way we treat periods with mismatched freq is almost identical to how we treat tzawareness-compat. Or is dt64tz special in this case? I'd be OK with not-raising in _get_indexer_non_comparable, just want to get this behavior consistent, bc ATM we're all over the place. |
continue | ||
# For object dtype we are liable to get a different exception message | ||
with pytest.raises(TypeError): | ||
pi.get_indexer(other2, method=method) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback notice in these cases we are currently raising in master bc the scalar comparisons raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this new?
get_indexer is not super public but i believe it will never raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right that get_indexer with method=None should never raise (maybe with tzawareness corner cases), but with method="ffill" the following raises on master:
dti = pd.date_range("2016-01-01", periods=3)
rng = pd.Index(range(5))
>>> dti.get_indexer(rng, method="ffill")
TypeError: '<' not supported between instances of 'int' and 'Timestamp'
xref #36320 |
…into ref-maybe_promote-2
…f-maybe_promote-2
So I'm increasingly convinced that the current PeriodIndex.get_indexer behavior with method != None is wrong:
But pi vs pi2 are uncomparable just like pi vs dti, so we should raise the same way:
|
pd.IntervalIndex.from_breaks(dti4), | ||
] | ||
) | ||
def non_comparable(request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non_comparable_idx
pandas/core/indexes/base.py
Outdated
@@ -4973,6 +4993,22 @@ def _maybe_promote(self, other: "Index"): | |||
|
|||
return self, other | |||
|
|||
def _get_other_deep(self, other: "Index") -> "Index": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should just be a function, no? its also a funny name. I think we have other similar things (e.g. this look like what .to_dense() does but for dtypes and not values).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to _unpack_nested_dtypes function
alright! get ready to see a bunch of de-duplication and fastpaths coming in. |
hit me! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Index._is_comparable_dtype is limited because it misses a few cases: object dtype, and sub-dtypes within categorical. This implements Indes._should_compare, which handles those correctly. It then implements
_get_indexer_non_comparable
for cases in which we short-circuit non-comparable dtypes.As a proof of concept, this then uses _should_compare and _get_indexer_non_comparable for
PeriodIndex.get_indexer
.The behavior change, which this tests, is one that IIUC is a bug. That is, when we do get_indexer with a method and non-comparable dtypes, we should raise instead of return all minus-ones.
If implemented, we'll be able to use _should_compare to simplify all of the get_indexer, get_indexer_non_unique, and set op methods.