Skip to content

PERF: Float64Index to use Float hashtable backend #6471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Feb 25, 2014 · 11 comments · Fixed by #6879
Closed

PERF: Float64Index to use Float hashtable backend #6471

jreback opened this issue Feb 25, 2014 · 11 comments · Fixed by #6879
Labels
Compat pandas objects compatability with Numpy or Python functions Performance Memory or execution speed performance
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Feb 25, 2014

https://groups.google.com/forum/m/#!topic/pydata/zUxl7rOHVNY

Currently Float64Index uses the Object back-end for ease of implementation.
See Here: https://github.com/pydata/pandas/blob/master/pandas/core/index.py#L1881

  • change the backend to use Float64Engine - the cythob code needs fixin in order to make it work.
  • needs vbench for perf indications
  • needs pickle validation that it is backward compat (it should be w/o changes)
@jreback jreback added this to the 0.14.0 milestone Feb 25, 2014
@cpcloud
Copy link
Member

cpcloud commented Feb 25, 2014

@jreback can I take this or do u already have something worked up?

@jreback
Copy link
Contributor Author

jreback commented Feb 25, 2014

go for it!

its very close....I just didn't have time to debug it....most of the code was already there from @wesm originally

@cpcloud
Copy link
Member

cpcloud commented Feb 25, 2014

Cool Ill take a look

@jreback
Copy link
Contributor Author

jreback commented Mar 19, 2014

@cpcloud any progress on this?

@cpcloud
Copy link
Member

cpcloud commented Mar 20, 2014

Yep ... should be able to put up a WIP PR this weekend

@jreback
Copy link
Contributor Author

jreback commented Mar 20, 2014

gr8 thanks

@cpcloud cpcloud self-assigned this Mar 22, 2014
@cpcloud
Copy link
Member

cpcloud commented Mar 22, 2014

nan isn't really a singleton, probably because of its internal representation as some large integer. Small integers are singletons in Python, whereas larger ones are new instances of int every time and so their id changes. The first set membership test checks the object nan whose reference I put in to the array whereas flt has a new instance of nan. What's weird is that hash(nan) == 0 and yet the membership check fails. I thought the order of lookup for sets was hash -> eq (if no __eq__ method then compare id), but since nan is an instance of float it obviously has that method. @jreback Am I missing something here?

@jreback
Copy link
Contributor Author

jreback commented Mar 22, 2014

nope that's why you can lookup object dtyped floats/nan pretty easiky
but actual Nan's that are floats need special logic
in essence you have to segregate Nan's and keep a reference to their locations and not hash them
so keep the float hash table for non-nan then getitem checks for nan before you test the hash

@jreback
Copy link
Contributor Author

jreback commented Apr 9, 2014

@cpcloud coming along?

@cpcloud
Copy link
Member

cpcloud commented Apr 12, 2014

Yep coming along. Couple of errors left to go. Mostly just renaming and keeping those refs to nan locs around.

@jreback
Copy link
Contributor Author

jreback commented Apr 12, 2014

gr8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants