Skip to content

ENH: Float64Index now uses Float64Hashtable as a backend #6879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 14, 2014
Merged

ENH: Float64Index now uses Float64Hashtable as a backend #6879

merged 1 commit into from
Apr 14, 2014

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Apr 13, 2014

closes #6471

@jreback
Copy link
Contributor

jreback commented Apr 13, 2014

think we need a vbench so can see perf impact of various operations

getting
slicing
contruction

of fairly long index

@cpcloud
Copy link
Member Author

cpcloud commented Apr 13, 2014

yep i'm in the middle of it now :)

@cpcloud cpcloud self-assigned this Apr 13, 2014
@cpcloud cpcloud added this to the 0.14.0 milestone Apr 13, 2014
@cpcloud
Copy link
Member Author

cpcloud commented Apr 13, 2014

@jreback do u mean slicing/getting with a frame/series or the index itself?

@cpcloud
Copy link
Member Author

cpcloud commented Apr 13, 2014

probably both i htink

@jreback
Copy link
Contributor

jreback commented Apr 13, 2014

I think u can test just with the index

iirc that was the issue perf with the object based index was somewhat slow (not sure exactly what was slow - maybe more details on the issue)

@jreback
Copy link
Contributor

jreback commented Apr 13, 2014

actually the op said index ops like

2 * index were slow
(which if they are object would be true)

it's an odd operation but then again maybe for a float index it's not (though u could simply convert to a series)

maybe give a battery of tests and see what sticks out

@cpcloud
Copy link
Member Author

cpcloud commented Apr 13, 2014

just timed that particular operation with a 1,000,000 element float64index ... same speed as raw numpy float64 .... comparing against master ...

i'll add some arith ops in the bench

@cpcloud
Copy link
Member Author

cpcloud commented Apr 13, 2014

i don't understand why it says failed .... it's passing on travis ..
https://travis-ci.org/cpcloud/pandas/builds/22912880
maybe i'm being impatient

@jreback
Copy link
Contributor

jreback commented Apr 13, 2014

gr8

for sure will be faster for ops

slicing I suspect might be the same though

@jreback
Copy link
Contributor

jreback commented Apr 13, 2014

https://travis-ci.org/pydata/pandas

some options thing failed

the green is on the master run not local branch

@cpcloud
Copy link
Member Author

cpcloud commented Apr 13, 2014

ah ok ... i'll stop asking questions about travis now, i promise

@cpcloud
Copy link
Member Author

cpcloud commented Apr 14, 2014

master:

---------------------------------------------------------
Test name                                    |    #0    |
---------------------------------------------------------
index_float64_slice_indexer_even             |   0.0060 |
index_float64_slice_indexer_basic            |   0.0021 |
index_float64_boolean_indexer                |   8.5363 |
index_float64_get                            |   0.0017 |
index_float64_construct                      |  50.8270 |
index_float64_div                            |  32.8536 |
index_float64_mul                            |  31.4170 |
index_float64_boolean_series_indexer         |   8.8483 |
---------------------------------------------------------
Test name                                    |    #0    |
---------------------------------------------------------

this PR:

---------------------------------------------------------
Test name                                    |    #0    |
---------------------------------------------------------
index_float64_slice_indexer_even             |   0.0060 |
index_float64_slice_indexer_basic            |   0.0017 |
index_float64_boolean_indexer                |   3.5873 |
index_float64_get                            |   0.0020 |
index_float64_construct                      |  37.9570 |
index_float64_div                            |   1.4310 |
index_float64_mul                            |   1.2623 |
index_float64_boolean_series_indexer         |   3.5966 |
---------------------------------------------------------
Test name                                    |    #0    |
---------------------------------------------------------

@cpcloud
Copy link
Member Author

cpcloud commented Apr 14, 2014

everything's either the same or faster

@jreback
Copy link
Contributor

jreback commented Apr 14, 2014

well a big +1 on that then!

excellent

so maybe add a doc note (in indexing.rst/Float64index section) that as of 0.14 it's now backed by a float index type

otherwise looks good 2 me

@jreback
Copy link
Contributor

jreback commented Apr 14, 2014

@jorisvandenbossche @TomAugspurger

comments?

@jreback
Copy link
Contributor

jreback commented Apr 14, 2014

@cpcloud I posted to the mailing as well

@cpcloud
Copy link
Member Author

cpcloud commented Apr 14, 2014

cool thx!

@cpcloud
Copy link
Member Author

cpcloud commented Apr 14, 2014

here's a comparison that's a little easier on the eyes:

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
index_float64_mul                            |   1.3734 |  36.9690 |   0.0371 |
index_float64_div                            |   1.4606 |  37.8457 |   0.0386 |
index_float64_boolean_series_indexer         |   4.0197 |  12.3593 |   0.3252 |
index_float64_boolean_indexer                |   3.8797 |  11.3870 |   0.3407 |
index_float64_construct                      |  45.4140 |  64.4014 |   0.7052 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

@@ -1360,6 +1360,16 @@ Of course if you need integer based selection, then use ``iloc``

dfir.iloc[0:5]

``Float64Index`` Backend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just make this a note in the float64index section. a don't really need a new section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@cpcloud cpcloud modified the milestones: 1.0, 0.14.0 Apr 14, 2014
@shoyer
Copy link
Member

shoyer commented Apr 14, 2014

This looks great to me -- we have a work around for xray, but it's pretty hacky and this is much nicer. Thanks @cpcloud!

@jreback
Copy link
Contributor

jreback commented Apr 14, 2014

merge when ready

cpcloud added a commit that referenced this pull request Apr 14, 2014
ENH: Float64Index now uses Float64Hashtable as a backend
@cpcloud cpcloud merged commit 8e36ff4 into pandas-dev:master Apr 14, 2014
@cpcloud cpcloud deleted the float64-index-enh branch April 14, 2014 11:18
@cpcloud cpcloud modified the milestones: 0.14.0, 1.0 Apr 14, 2014
@cpcloud
Copy link
Member Author

cpcloud commented Apr 14, 2014

@jreback this was marked for 1.0 i changed to 0.14.0 is that ok?

@jreback
Copy link
Contributor

jreback commented Apr 14, 2014

yep

@TomAugspurger
Copy link
Contributor

Great stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Float64Index to use Float hashtable backend
4 participants