-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Wrong result for float32 Series when using bottleneck #25307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What are you suggesting pandas to do here exactly? Your example shows a Series matches numpy behavior and the "problem" is documented limitation of the third party tool you are using, so I'm unclear on what pandas should be doing (though admittedly not terribly familiar with bottleneck) |
I'm not sure what you mean by "Series matches numpy behavior". In the example above,
Update: it's not an overflow, it is a numerical error caused by naive summation in bottleneck. I think the best option in this case is, if pandas avoids using bottleneck for |
Oh, I think I misinterpreted the situation slightly. The wrong result is actually caused by numerical instability in bottleneck's implementation. That's still not good, but maybe different than an overflow. The hint in de documentation to overflow is not related. Still, I think it might make sense not to use bottleneck for 32bit. |
Is there still information missing here? Feel free to close the ticket if you are sure that this is not a pandas bug. I simply think it buts pandas users in danger of wrong results, so something should be done. Let me know, if I can help somehow. |
Has xarray decided what to do for 32-bit? |
Kind of. xarray's master is not affected anylonger, it has been fixed in pydata/xarray@0b9ab2d1 I'm not 100% sure if fixing this was the intention of the commit. The commit is titled "Refactor nanops" and the diff more than 1000 lines. So I'm not sure if this is a clear decision. The discussion is here, the ticket is still open. |
Closing as I think this is more of an issue with bottleneck |
Minimal example
Requires bottleneck, numpy and pandas to be installed:
Problem description
The
mean()
of large float32 Series is wrong when bottleneck is used. Uninstalling bottleneck or using float64 is a valid workaround. xarray is or has been affected too, see pydata/xarray#1346.Bottleneck's documentation explicitly mentions that no error is raised in case of an overflow, not sure if this is still to be considered as bug in bottleneck.Anyhow since it seems quite severe, I want to raise attention here too.Update: This is not an overflow, it's a numerical error (which is very high because bottleneck does not use pairwise summation).
Bottleneck's implementation of mean().
Related issues
Expected Output
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: