-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REGR: fix non-reduction apply with tz-aware objects #31614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: fix non-reduction apply with tz-aware objects #31614
Conversation
pandas/core/apply.py
Outdated
# if "Function does not reduce" not in str(err): | ||
# # catch only ValueError raised intentionally in libreduction | ||
# raise | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel I suppose you don't like this "fix", but the reality seems to be that there are other ways that the libreduction can raise an error, that should still be catched. I can try to find a way to avoid calling libreduction (add a case to the if
check above), but I have the feeling that this would just be a check for the specifc case reported, and there will then just be other cases that raise other errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the reason it is now getting here, is because it is object dtype (we already check for extension dtypes and datetimelike dtypes to not get here). Maybe in general object dtype should not go through the cython code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stepping back for a second, it isn't clear to me why it's trying to create a DatetimeTZBlock instead of an ObjectBlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I tried to step into with the debugger, but didn't directly find the cause. It seems there is some inference at some step, which infers that it is datetimetz dtype, but then the values are not converted and still objects, which then causes an error in creating the block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, it created a DatetimeTZBlock with array([None], dtype=object)
or array([Timestamp('2020-01-01 00:00:00+0000', tz='UTC')], dtype=object)
as its values. It is then copying that block that fails (through make_block_same_class
), but the question is how it created such a block in the first place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to assume that reductions are instances of the dtype there, or do we need inference? Naively, it seems like that would fail of .mean()
on integer dtypes, since that would return a float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to assume that reductions are instances of the dtype there, or do we need inference?
I think this is creating a Series that is effectively a view on some portion of our original Series, so it should be OK. Definitely merits more eyeballs.
Do we have an idea how big the gains from this are? this piece of code is disproportionately difficult to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is creating a Series that is effectively a view on some portion of our original Series, so it should be OK. Definitely merits more eyeballs.
Yep, I think you're correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche can you try the suggestion in https://github.com/pandas-dev/pandas/pull/31614/files#r374228865 to pass dtype=self.dtype
to that constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's indeed only creating a subset of the original data, so the dtype should always be correct, and so avoids dtype inference (the calculation of the function only happens afterwards, and thus also a potential dtype change like int->float for mean)
LGTM pending green |
Seems all green (pending travis), thanks for the suggestion @jbrockmendel ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, test comment
@@ -703,6 +703,12 @@ def apply_list(row): | |||
) | |||
tm.assert_series_equal(result, expected) | |||
|
|||
def test_apply_noreduction_tzaware_object(self): | |||
# https://github.com/pandas-dev/pandas/issues/31505 | |||
df = pd.DataFrame({"foo": [pd.Timestamp("2020", tz="UTC")]}, dtype="object") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would also test this w/o the col.copy(), e.g. df.apply(lambda col: col)
97f4030
to
302907d
Compare
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon! If these instruction are inaccurate, feel free to suggest an improvement. |
Hmm, apparently my latest merge of master messed up the whatsnew. Will fix that in a PR to master, and directly in the backport of this PR EDIT -> #31686 |
Closes #31505