BENCH: Index.take benchmark is measuring wrong thing #18000

jorisvandenbossche · 2017-10-27T12:28:20Z

The take benchmarks indexing.IndexingMethods.time_take_intindex is benchmarking take on a boolean list, which take interprets as a [0, 1, 0, 1, ...] values, which is a bit silly to benchmark.
We should add some actual benchmark on integers instead.

The text was updated successfully, but these errors were encountered:

sangramga · 2017-10-30T13:31:22Z

@jorisvandenbossche I would like to make this issue to be my first contribution. I have looked at the source code and ready to take it up.
Although I have unable to see as to why and how a Boolean list self.indexer is interpreted as integer [0,1,0,1,...] values?
Is it possible to resolve this issue by creating a new indexer list like self.int_indexer for benchmarking? What do you have in mind?

jorisvandenbossche · 2017-10-30T13:38:50Z

Is it possible to resolve this issue by creating a new indexer list like self.int_indexer for benchmarking? What do you have in mind?

Yes, although I think you can just replace the existing self.indexer with a new list of ints instead of bools (we don't need to keep the old bench of using booleans).

Although I have unable to see as to why and how a Boolean list self.indexer is interpreted as integer [0,1,0,1,...] values?

If you try to run the code (or similar), you will see:

In [84]: s = pd.Series(range(5))

In [85]: s.take([True, False, True, False, True])
Out[85]: 
1    1
0    0
1    1
0    0
1    1
dtype: int64

In [86]: s.loc[[True, False, True, False, True]]
Out[86]: 
0    0
2    2
4    4
dtype: int64

so take interprets the True, False as 0, 1 and not as a mask (like is done in loc)

sangramga · 2017-11-16T16:01:21Z

@jorisvandenbossche I made the following changes in indexing.IndexingMethods.time_take_intindex

Replaced the boolean list indexer with randomized integer list
self.indexer = list(np.random.randint(0, 100000, size=100000))
Does this resolve the issue?

Also where should I add Tests? Should I add/modify any tests?

hardikpnsp · 2020-09-25T03:23:10Z

It seems like the issue has been open for quite a while. I looked into the take benchmark, It is still using boolean values as indexer. Does not seem like anyone is working on the issue, I will quickly fix that.

hardikpnsp · 2020-09-25T03:23:15Z

take

jorisvandenbossche added Benchmark Performance (ASV) benchmarks Difficulty Novice good first issue labels Oct 27, 2017

jreback added good first issue and removed good first issue Difficulty Novice labels Dec 15, 2017

jbrockmendel removed the Effort Low label Oct 21, 2019

github-actions bot assigned hardikpnsp Sep 25, 2020

hardikpnsp mentioned this issue Sep 26, 2020

ASV: used integer ndarray as indexer for Series.take indexing benchmark #36656

Merged

4 tasks

jreback added this to the 1.2 milestone Oct 2, 2020

jreback closed this as completed in #36656 Oct 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BENCH: Index.take benchmark is measuring wrong thing #18000

BENCH: Index.take benchmark is measuring wrong thing #18000

jorisvandenbossche commented Oct 27, 2017

sangramga commented Oct 30, 2017 •

edited

Loading

jorisvandenbossche commented Oct 30, 2017

sangramga commented Nov 16, 2017 •

edited

Loading

hardikpnsp commented Sep 25, 2020

hardikpnsp commented Sep 25, 2020

BENCH: Index.take benchmark is measuring wrong thing #18000

BENCH: Index.take benchmark is measuring wrong thing #18000

Comments

jorisvandenbossche commented Oct 27, 2017

sangramga commented Oct 30, 2017 • edited Loading

jorisvandenbossche commented Oct 30, 2017

sangramga commented Nov 16, 2017 • edited Loading

hardikpnsp commented Sep 25, 2020

hardikpnsp commented Sep 25, 2020

sangramga commented Oct 30, 2017 •

edited

Loading

sangramga commented Nov 16, 2017 •

edited

Loading