Skip to content

REGR: Avoid overflow with groupby sum #48059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 12, 2022

Conversation

phofl
Copy link
Member

@phofl phofl commented Aug 12, 2022

This is the alternative to #48044.

I would very much prefer this one, even though it is a bit slower.

cc @mroeschke cc @jorisvandenbossche

@phofl phofl added Groupby Regression Functionality that used to work in a prior pandas version labels Aug 12, 2022
@phofl phofl added this to the 1.5 milestone Aug 12, 2022
@mroeschke
Copy link
Member

I would be okay with this version of the change

@jorisvandenbossche
Copy link
Member

even though it is a bit slower.

Do you have a rough estimate of how much slower?

I certainly think this version is good enough (in the end, before we were casting as well, to float64), but if it is a significant difference, we can still table it as a future possible improvement if we find a better way to avoid the many compilation combinations

@phofl
Copy link
Member Author

phofl commented Aug 12, 2022

Around 5% of the .sum() runtime is taken up by the astype call. So no big deal (1_000_000 values)

Memory footprint is something that would make it worth, if we can avoid the duplications.

But for now, this should be sufficient since we were castint to float anyway

@phofl
Copy link
Member Author

phofl commented Aug 12, 2022

Will open an issue, after this is merged.

@phofl
Copy link
Member Author

phofl commented Aug 12, 2022

Thanks joris

If ok, I would merge when ci is greenish

@jorisvandenbossche
Copy link
Member

Go ahead!

@phofl
Copy link
Member Author

phofl commented Aug 12, 2022

Failures unrelated

@phofl phofl merged commit 35c8a7e into pandas-dev:main Aug 12, 2022
@phofl phofl deleted the reg_groupby_sum_overflow_cast branch August 12, 2022 19:09
YYYasin19 pushed a commit to YYYasin19/pandas that referenced this pull request Aug 23, 2022
* REGR: Avoid overflow with groupby sum

* Add comment
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
* REGR: Avoid overflow with groupby sum

* Add comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants