-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REGR: fix pd.cut with datetime IntervalIndex as bins #47771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: fix pd.cut with datetime IntervalIndex as bins #47771
Conversation
# "Index" has no attribute "left" | ||
return self.left._should_compare(target) # type: ignore[attr-defined] | ||
# return self.left._should_compare(target) # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this commented out line work in the future or is it leftover from investigating this fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so that ideally it gets restored in the future. I added a comment referencing the issue I had opened about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small question otherwise LGTM. Looks like there was a merge conflict
is that the preferred longer term fix and this a targeted PR for backport? |
That's indeed the preferred long term to change that behaviour, and this PR restores the behaviour for both |
Thanks @jorisvandenbossche merging on green |
…tervalIndex as bins) (#48166) Backport PR #47771: REGR: fix pd.cut with datetime IntervalIndex as bins Co-authored-by: Joris Van den Bossche <[email protected]>
Closes #46218
xref #46218 (comment) for the actual reason that causes
cut
to fail: inside the implementation, we convert the actual timestamp data to floats (to pass to the underlying algorithm), but then when passing those values toIntervalIndex.get_indexer
, those numeric values no longer "match" with the datetime64 interval dtype. And in theory,get_indexer
should then fail (return -1 for not finding the target values), but until #42227 this actually happily worked (and therefore also letcut
with datetime64 interval bins work).This PR doesn't solve the root cause (we should change the logic inside
cut
so that we don't create this mismatch in values vs bins), but it is a short-term fix of the regression. It basically reverts the (unintended, I think) behaviour change introduced by #42227, but without actually reverting that PR (I am keeping the refactor introducing_should_partial_index
of that PR, but I am only changing_should_partial_index
itself a little bit to match better with what happened before in practice).I will open a separate issue about the issue in
cut
and that we should makeIntervalIndex.from_indexer
more strict. (opened -> #47772)