Skip to content

BUG: groupby.sum with timedelta64 and NaT #44658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 29, 2021

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Nov 28, 2021

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is for backport can you add a whatsnew for 1.3.5

@jreback jreback added this to the 1.3.5 milestone Nov 28, 2021
@jreback jreback added Groupby Timedelta Timedelta data type Bug labels Nov 28, 2021
@jreback jreback merged commit 3352625 into pandas-dev:master Nov 29, 2021
@jreback
Copy link
Contributor

jreback commented Nov 29, 2021

@meeseeksdev backport 1.3.x

@lumberbot-app

This comment has been minimized.

@lumberbot-app

This comment has been minimized.

@jreback
Copy link
Contributor

jreback commented Nov 29, 2021

cc @jbrockmendel as suspected, needs a manual backport.

@jbrockmendel
Copy link
Member Author

having a hard time with the cherrypick. flake8 complains about the cython syntax at the git cherry-pick --continue step. this happens frequently and ive gotten used to ignoring it with --no-verify. but with git cherrypick --continue there is no --no-verify option.

@jbrockmendel
Copy link
Member Author

cc @simonjayhawkins any idea how to handle the cherrypick problem?

@simonjayhawkins
Copy link
Member

cc @simonjayhawkins any idea how to handle the cherrypick problem?

didn't have any.

PS. Thanks for the PR.

@simonjayhawkins
Copy link
Member

At first glance this appears to cause a slowdown of up to 2x for several benchmarks, but including the cython 0.29.21 results it appears the reported regressions on the 1.3.x tracker could be false positives?

https://simonjayhawkins.github.io/fantastic-dollop/#series_methods.Mode.time_mode?branch=1.3.x&p-N=100000&p-dtype='int'&commits=d3a92ff9

@jbrockmendel jbrockmendel deleted the bug-42659 branch December 4, 2021 05:55
@jbrockmendel
Copy link
Member Author

At first glance this appears to cause a slowdown of up to 2x for several benchmarks

Looking at the diff, the only non-invasive optimization I see is doing <float64_t>NPY_NAT just once at the module level. Hard to imagine that making a big dent.

but including the cython 0.29.21 results it appears the reported regressions on the 1.3.x tracker could be false positives?

yah those look pretty darn noisy. is "wait a couple days for more data" a viable option?

@jreback
Copy link
Contributor

jreback commented Dec 4, 2021

this is not worth holding up 1.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG:dataframe.groupby('some_column').timedelta.sum() results wrong when timedelta contains NaT (pandas=1.3.0)
3 participants