Skip to content

REGR: fix pd.cut with datetime IntervalIndex as bins #47771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Jul 17, 2022

Closes #46218

xref #46218 (comment) for the actual reason that causes cut to fail: inside the implementation, we convert the actual timestamp data to floats (to pass to the underlying algorithm), but then when passing those values to IntervalIndex.get_indexer, those numeric values no longer "match" with the datetime64 interval dtype. And in theory, get_indexer should then fail (return -1 for not finding the target values), but until #42227 this actually happily worked (and therefore also let cut with datetime64 interval bins work).

This PR doesn't solve the root cause (we should change the logic inside cut so that we don't create this mismatch in values vs bins), but it is a short-term fix of the regression. It basically reverts the (unintended, I think) behaviour change introduced by #42227, but without actually reverting that PR (I am keeping the refactor introducing _should_partial_index of that PR, but I am only changing _should_partial_index itself a little bit to match better with what happened before in practice).

I will open a separate issue about the issue in cut and that we should make IntervalIndex.from_indexer more strict. (opened -> #47772)

@jorisvandenbossche jorisvandenbossche added Regression Functionality that used to work in a prior pandas version Interval Interval data type cut cut, qcut labels Jul 17, 2022
@jorisvandenbossche jorisvandenbossche added this to the 1.4.4 milestone Jul 17, 2022
# "Index" has no attribute "left"
return self.left._should_compare(target) # type: ignore[attr-defined]
# return self.left._should_compare(target) # type: ignore[attr-defined]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this commented out line work in the future or is it leftover from investigating this fix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so that ideally it gets restored in the future. I added a comment referencing the issue I had opened about that

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small question otherwise LGTM. Looks like there was a merge conflict

@simonjayhawkins
Copy link
Member

I will open a separate issue about the issue in cut and that we should make IntervalIndex.from_indexer more strict. (opened -> #47772)

is that the preferred longer term fix and this a targeted PR for backport?

@jorisvandenbossche
Copy link
Member Author

is that the preferred longer term fix and this a targeted PR for backport?

That's indeed the preferred long term to change that behaviour, and this PR restores the behaviour for both cut as IntervalIndex.get_indexer.
Note there is also #47773 which only fixes cut, and something we should also do anyway. But since this PR also restores the behaviour of get_indexer (which is strictly speaking public), I think this one is good to include for 1.4.x, and we can then also merge #47773 for main as well.

@simonjayhawkins
Copy link
Member

Thanks @jorisvandenbossche merging on green

@simonjayhawkins simonjayhawkins merged commit 34af798 into pandas-dev:main Aug 19, 2022
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Aug 19, 2022
phofl pushed a commit that referenced this pull request Aug 19, 2022
…tervalIndex as bins) (#48166)

Backport PR #47771: REGR: fix pd.cut with datetime IntervalIndex as bins

Co-authored-by: Joris Van den Bossche <[email protected]>
@jorisvandenbossche jorisvandenbossche deleted the regr-cut-datetime-interval branch August 20, 2022 06:51
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cut cut, qcut Interval Interval data type Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.cut regression since version 1.4.0 when operating on datetime
3 participants