Skip to content

REGR: fix non-reduction apply with tz-aware objects #31614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

Closes #31505

@jorisvandenbossche jorisvandenbossche added this to the 1.0.1 milestone Feb 3, 2020
@jorisvandenbossche jorisvandenbossche added Apply Apply, Aggregate, Transform, Map Regression Functionality that used to work in a prior pandas version labels Feb 3, 2020
# if "Function does not reduce" not in str(err):
# # catch only ValueError raised intentionally in libreduction
# raise
pass
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel I suppose you don't like this "fix", but the reality seems to be that there are other ways that the libreduction can raise an error, that should still be catched. I can try to find a way to avoid calling libreduction (add a case to the if check above), but I have the feeling that this would just be a check for the specifc case reported, and there will then just be other cases that raise other errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason it is now getting here, is because it is object dtype (we already check for extension dtypes and datetimelike dtypes to not get here). Maybe in general object dtype should not go through the cython code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stepping back for a second, it isn't clear to me why it's trying to create a DatetimeTZBlock instead of an ObjectBlock

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I tried to step into with the debugger, but didn't directly find the cause. It seems there is some inference at some step, which infers that it is datetimetz dtype, but then the values are not converted and still objects, which then causes an error in creating the block.

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, it created a DatetimeTZBlock with array([None], dtype=object) or array([Timestamp('2020-01-01 00:00:00+0000', tz='UTC')], dtype=object)as its values. It is then copying that block that fails (through make_block_same_class), but the question is how it created such a block in the first place

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to assume that reductions are instances of the dtype there, or do we need inference? Naively, it seems like that would fail of .mean() on integer dtypes, since that would return a float.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to assume that reductions are instances of the dtype there, or do we need inference?

I think this is creating a Series that is effectively a view on some portion of our original Series, so it should be OK. Definitely merits more eyeballs.

Do we have an idea how big the gains from this are? this piece of code is disproportionately difficult to maintain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is creating a Series that is effectively a view on some portion of our original Series, so it should be OK. Definitely merits more eyeballs.

Yep, I think you're correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche can you try the suggestion in https://github.com/pandas-dev/pandas/pull/31614/files#r374228865 to pass dtype=self.dtype to that constructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's indeed only creating a subset of the original data, so the dtype should always be correct, and so avoids dtype inference (the calculation of the function only happens afterwards, and thus also a potential dtype change like int->float for mean)

@jbrockmendel
Copy link
Member

LGTM pending green

@jorisvandenbossche
Copy link
Member Author

Seems all green (pending travis), thanks for the suggestion @jbrockmendel !

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, test comment

@@ -703,6 +703,12 @@ def apply_list(row):
)
tm.assert_series_equal(result, expected)

def test_apply_noreduction_tzaware_object(self):
# https://github.com/pandas-dev/pandas/issues/31505
df = pd.DataFrame({"foo": [pd.Timestamp("2020", tz="UTC")]}, dtype="object")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would also test this w/o the col.copy(), e.g. df.apply(lambda col: col)

@jorisvandenbossche jorisvandenbossche merged commit 881d0b7 into pandas-dev:master Feb 5, 2020
@jorisvandenbossche jorisvandenbossche deleted the apply-regr-31505 branch February 5, 2020 08:15
@lumberbot-app
Copy link

lumberbot-app bot commented Feb 5, 2020

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
$ git checkout 1.0.x
$ git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
$ git cherry-pick -m1 881d0b7f88b328f30a57b88a2b0f0f65478d2dc1
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
$ git commit -am 'Backport PR #31614: REGR: fix non-reduction apply with tz-aware objects'
  1. Push to a named branch :
git push YOURFORK 1.0.x:auto-backport-of-pr-31614-on-1.0.x
  1. Create a PR against branch 1.0.x, I would have named this PR:

"Backport PR #31614 on branch 1.0.x"

And apply the correct labels and milestones.

Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon!

If these instruction are inaccurate, feel free to suggest an improvement.

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Feb 5, 2020

Hmm, apparently my latest merge of master messed up the whatsnew. Will fix that in a PR to master, and directly in the backport of this PR

EDIT -> #31686

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REGR: copy() within apply() raises ValueError: cannot create a DatetimeTZBlock without a tz, as of 1.0.0
4 participants