-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Float64Index is very slow in some condition. #13166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This might have been true on older versions of pandas (maybe < 0.16.0, I don't recall the exact version) as these were object based. but these moved to true float based hashtables (typed).
closing, but pls show your versions note you have to time these with a single iteration as these build the hash tables which are then cached (so of course that tells you the lookup time, but you want the build time as well) |
I am using pandas 0.18.1, python 3.5 64bit, I confirmed this problem both on Linux and Windows 7.
|
@jreback It seems that Python 3.5 has the problem, but Python 2.7 has no problem. Can you reopen this issue? You can confirm this on https://try.jupyter.org/. |
hmm interesting so you want to try profiling the cython ? |
I think here is the problem, but I don't know why:
I changed the line to following code, it view the double value as int64 and use the same formula as
The %time result is almost the same, and it's even 2x faster for |
does that break any tests? can you run the asv suite as well (and add benchmark for this). |
https://github.com/python/cpython/blob/master/Python/pyhash.c#L85 is the existing PyHash_double. Its probably generating 'better' hashes that your change, but in the end of the day I don't see why that's preferable. |
@ruoyu0088 see also #13335 |
The following code is very slow:
after debug it, I found
Float64Engine.get_loc()
is slow. Here is a demo:outputs:
The text was updated successfully, but these errors were encountered: