BLD: add benchmarks for all single-axis indexers #6450

immerrr · 2014-02-23T10:51:23Z

This PR will add benchmarks for all available single-axis container indexing patterns discussed in #6328.

It is currently in WIP state, but the ideas and comments are welcome. Multi-axis indexers are left for future to avoid overcomplicating things from the beginning.

Part of functionality implemented in this PR is caching tm.makeCustomIndex results to avoid spending several seconds in setup of each benchmark. It may deserve a separate PR to implement this caching within pandas.util.testing module.

TODO:

jreback · 2014-02-23T22:19:10Z

can u put a sample run of this up

immerrr · 2014-02-24T08:15:58Z

Here you are

immerrr · 2014-02-25T08:37:23Z

Implemented container shape specification (so that Panel & Panel4D can be also benchmarked) and updated the gist with results.

There are some suspicious results, though, e.g.

indexing__series_arr_pos_1000000s            | 206.7204 |

The figure is suspicious because when I run it separately like indexing.indexing__series_arr_pos_1000000s.run() it takes 20ms, not 200.

jreback · 2014-02-25T12:35:56Z

I would change this script to something like generate_vbenches.py that then actually creates indexing2.py by generating the code. The reason is that then say you want to repeat a specific benchmark it easy to copy/paste. (just put a warning at the top of indexing2.py to say run python generate_vbenches.py instead of editing indexing2.py (you can hardcode it to overwrite the output file).

immerrr · 2014-02-26T07:58:21Z

That might add a lot of complexity to the vbench suite (which is already complex beyond my liking). I'm not sure I understand what is wrong with reproducing the benchmarks by something like

In [1]: cd vb_suite/
/home/immerrr/sources/pandas/vb_suite

In [2]: import indexing

In [3]: indexing.
indexing.Benchmark                                         indexing.indexing__panel4d_slice_pos_10000sx20sx20sx20s
indexing.MAX_ENTRIES                                       indexing.indexing__panel_arr_label_100000sx20sx20s
indexing.SECTION                                           indexing.indexing__panel_arr_pos_100000sx20sx20s
indexing.bm_df_getitem                                     indexing.indexing__panel_basic_label_100000sx20sx20s
indexing.bm_df_getitem2                                    indexing.indexing__panel_basic_pos_100000sx20sx20s
indexing.bm_df_getitem3                                    indexing.indexing__panel_iloc_mask_100000sx20sx20s
indexing.bm_getitem                                        indexing.indexing__panel_loc_mask_100000sx20sx20s
indexing.common_setup                                      indexing.indexing__panel_slice_label_100000sx20sx20s
indexing.datetime                                          indexing.indexing__panel_slice_pos_100000sx20sx20s
indexing.frame_iloc_big                                    indexing.indexing__series_arr_label_1000000s
indexing.frame_iloc_dups                                   indexing.indexing__series_arr_pos_1000000s
indexing.frame_loc_dups                                    indexing.indexing__series_basic_label_1000000s
indexing.frame_xs_mi_ix                                    indexing.indexing__series_basic_pos_1000000s
indexing.generate_index_benchmarks                         indexing.indexing__series_iloc_mask_1000000s
indexing.get_index                                         indexing.indexing__series_loc_mask_1000000s
indexing.indexing__dataframe_arr_label_100000sx100s        indexing.indexing__series_slice_label_1000000s
indexing.indexing__dataframe_arr_label_100sx100000s        indexing.indexing__series_slice_pos_1000000s
indexing.indexing__dataframe_arr_pos_100000sx100s          indexing.indexing_dataframe_boolean
indexing.indexing__dataframe_arr_pos_100sx100000s          indexing.indexing_dataframe_boolean_no_ne
indexing.indexing__dataframe_basic_label_100000sx100s      indexing.indexing_dataframe_boolean_rows
indexing.indexing__dataframe_basic_label_100sx100000s      indexing.indexing_dataframe_boolean_rows_object
indexing.indexing__dataframe_basic_pos_100000sx100s        indexing.indexing_dataframe_boolean_st
indexing.indexing__dataframe_basic_pos_100sx100000s        indexing.indexing_frame_get_value
indexing.indexing__dataframe_iloc_mask_100000sx100s        indexing.indexing_frame_get_value_ix
indexing.indexing__dataframe_iloc_mask_100sx100000s        indexing.indexing_panel_subset
indexing.indexing__dataframe_loc_mask_100000sx100s         indexing.mask
indexing.indexing__dataframe_loc_mask_100sx100000s         indexing.py
indexing.indexing__dataframe_slice_label_100000sx100s      indexing.pyc
indexing.indexing__dataframe_slice_label_100sx100000s      indexing.series_getitem_label_slice
indexing.indexing__dataframe_slice_pos_100000sx100s        indexing.series_getitem_pos_slice
indexing.indexing__dataframe_slice_pos_100sx100000s        indexing.series_mask
indexing.indexing__panel4d_arr_label_10000sx20sx20sx20s    indexing.series_xs_mi_ix
indexing.indexing__panel4d_arr_pos_10000sx20sx20sx20s      indexing.setup
indexing.indexing__panel4d_basic_label_10000sx20sx20sx20s  indexing.setup_template
indexing.indexing__panel4d_basic_pos_10000sx20sx20sx20s    indexing.sort_level_one
indexing.indexing__panel4d_iloc_mask_10000sx20sx20sx20s    indexing.sort_level_zero
indexing.indexing__panel4d_loc_mask_10000sx20sx20sx20s     indexing.statement
indexing.indexing__panel4d_slice_label_10000sx20sx20sx20s  indexing.tm

In [3]: indexing.indexing__data
indexing.indexing__dataframe_arr_label_100000sx100s    indexing.indexing__dataframe_iloc_mask_100000sx100s
indexing.indexing__dataframe_arr_label_100sx100000s    indexing.indexing__dataframe_iloc_mask_100sx100000s
indexing.indexing__dataframe_arr_pos_100000sx100s      indexing.indexing__dataframe_loc_mask_100000sx100s
indexing.indexing__dataframe_arr_pos_100sx100000s      indexing.indexing__dataframe_loc_mask_100sx100000s
indexing.indexing__dataframe_basic_label_100000sx100s  indexing.indexing__dataframe_slice_label_100000sx100s
indexing.indexing__dataframe_basic_label_100sx100000s  indexing.indexing__dataframe_slice_label_100sx100000s
indexing.indexing__dataframe_basic_pos_100000sx100s    indexing.indexing__dataframe_slice_pos_100000sx100s
indexing.indexing__dataframe_basic_pos_100sx100000s    indexing.indexing__dataframe_slice_pos_100sx100000s

In [3]: indexing.indexing__dataframe_loc
indexing.indexing__dataframe_loc_mask_100000sx100s  indexing.indexing__dataframe_loc_mask_100sx100000s

In [3]: indexing.indexing__dataframe_loc_mask_100
indexing.indexing__dataframe_loc_mask_100000sx100s  indexing.indexing__dataframe_loc_mask_100sx100000s

In [3]: indexing.indexing__dataframe_loc_mask_100sx100000s.run()
Out[3]: 
{'loops': 1000,
 'repeat': 3,
 'succeeded': True,
 'timing': 0.18867802619934082,
 'units': 'ms'}

jreback · 2014-02-26T12:10:54Z

ok

I would drop panel4d - they are identical to panel for most operations that matter

so these are all 0th axis ; nice to have the ndim-1 axis as well

immerrr · 2014-02-27T10:34:32Z

Split the benchmark into no-axis (obj[indexer]), axis=1(obj[indexer,:]) and axis=ndim-1 (obj[:,indexer]) cases.

Even with Panel4D benchmarks removed it's now 112 tests (unique string axes, Series, 2 shapes of DataFrame, 2 shapes of Panel). Adding non-unique indices will double that, so will adding setitem benchmarks, let alone different index types and multi-indexing.

It's not that I'm against adding this many benchmarks, but it would be nice to draw a line somewhere in internal API between axis label lookup operations and numpy-like indexing/slicing of block managers. This would allow testing/benchmarking them separately and avoid further combinatorial test count increase.

Also, the results are still weird...

jreback · 2014-02-27T11:02:56Z

how much extra time is the total run taking?

in theory also should condition onindex type (e.g. FloatIndex, DatetimeIndex)

jreback · 2014-02-27T11:04:27Z

maybe need to be split into fast / full versions (e.g. fast runs by default, full on-deman)

immerrr · 2014-02-27T11:32:30Z

As it is right now, it's not too long (under 1min).

But, like I said, it looks like something's wrong. E.g. given that the panel used in benchmark contains about 1M * 20 * 20 == 400M doubles, the benchmark should consume 3.2G+ of RAM at some point, but high-water mark of the process never exceeds several hundred megabytes on my desktop. It may be some kernel paging magic that happens because the allocated memory is never used, but I need to double check that before drawing any conclusions about run time.

jreback · 2014-02-27T11:34:53Z

ok.....that's too much anyhow....you canprob make the index smaller....say 100k

immerrr · 2014-02-27T16:51:05Z

Ok, fixed the bug and the results now look much better.

After downscaling, the test run takes 9mins on my Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz.

jreback · 2014-02-27T16:54:03Z

what takes 9min total perf run? (including compile time?)

immerrr · 2014-02-27T16:56:01Z

maybe need to be split into fast / full versions (e.g. fast runs by default, full on-deman)

Having in mind #6479 which may result in adding as many benchmarks as this PR, I thought maybe test_perf.py could use a parameter like run benchmarks from this file only, so that all long-running exhaustive benchmarks can be separated from one another and avoid making long runs unnecessarily long.

immerrr · 2014-02-27T16:56:54Z

what takes 9min total perf run? (including compile time?)

The following command did:

$ time ./test_perf.sh -H -N3 -S -u2 -r indexing__

<snip>

real    8m37.888s
user    8m15.132s
sys 0m21.824s

So, 9 minutes don't include compilation, only executing those 112 benchmarks.

jreback · 2014-02-27T17:01:30Z

Having in mind #6479 which may result in adding as many benchmarks as this PR, I thought maybe test_perf.py could use a parameter like run benchmarks from this file only, so that all long-running exhaustive benchmarks can be separated from one another and avoid making long runs unnecessarily long

yes...don't normally want to include all of these

FYI what does it take if you don't put the -r (e.g. the total suite takes what) for comparison

immerrr · 2014-02-28T06:48:01Z

FYI what does it take if you don't put the -r (e.g. the total suite takes what) for comparison

It's about 20mins with the same configuration (-H -N3 -S -u2).

jreback · 2014-02-28T11:47:42Z

hmm so adding these 112 benches makes it basically double in time

I would reduce the index size to 100k - see what happens

immerrr · 2014-02-28T13:50:40Z

I would reduce the index size to 100k - see what happens

The discussed figures are for 100k sizes, actually :) That happened after immerrr@7f0574f.

jreback · 2014-02-28T14:06:04Z

ahh you are repeating 3x ok then

prob ok for now
but let's think about creating a fast/slow argument anyhow
(default being fast)

immerrr · 2014-02-28T15:26:04Z

Bad news: trick with caching of makeCustomIndex doesn't work during comparative benchmarks. It becomes a habit with pandas to go deeper to make things work...

immerrr · 2014-02-28T17:07:26Z

Bad news: trick with caching of makeCustomIndex doesn't work during comparative benchmarks.

Ok, fixed that.

I've benchmarked #6440 with this new suite, here's the result.

jreback · 2014-02-28T17:14:13Z

vb_suite/eval.py

@@ -78,7 +78,7 @@
              name='eval_frame_and_one_thread',
              start_date=datetime(2013, 7, 26))

-setup = common_setup
+# setup = common_setup


what did this change?

ouch, that was a temporary fix.

Without it eval_frame_and_python and eval_frame_and_one_thread have same setup and thus share checksum which results in non-unique index error when benchmarks.db is being regenerated from scratch. Same goes for L105.

hmm...maybe just 'copy' common_setup then?

t

Oh sorry didn't see this.

Is the checksum only dependent on that string? I don't know how it's computed, but that seems strange.

It's dependent on setup and benchmarked code. For the benchmarks under
investigation setup differed from common_setup by single thread setup code
and if one removed it , the single/multi thread benches became identical
and caused db conflict.
On Feb 28, 2014 10:17 PM, "Phillip Cloud" [email protected] wrote:

In vb_suite/eval.py:

@@ -78,7 +78,7 @@
name='eval_frame_and_one_thread',
start_date=datetime(2013, 7, 26))

-setup = common_setup
+# setup = common_setup

Is the checksum only dependent on that string? I don't know how it's
computed, but that seems strange.

—
Reply to this email directly or view it on GitHub.

jreback · 2014-03-09T15:07:59Z

hows this coming?

immerrr · 2014-03-11T10:22:46Z

I've added benchmarks for non-string axes. Not everything works yet, but there are interesting outliers in updated results:

| indexing__panel_arr_label_10ix10ix100000dt_ax2   |      3 | 108.405881 |
| indexing__dataframe_arr_label_10ix100000mi_ax1   |      3 | 110.820320 |
| indexing__panel_arr_label_100000mix10ix10i       |      3 | 119.547314 |
| indexing__panel_arr_pos_100000mix10ix10i         |      3 | 119.803111 |
| indexing__panel_arr_pos_100000mix10ix10i_ax0     |      3 | 120.717234 |
| indexing__panel_arr_label_10ix10ix100000mi_ax2   |      3 | 137.265232 |
| indexing__panel_arr_label_100000mix10ix10i_ax0   |      3 | 155.488438 |
| indexing__panel_arr_pos_100000dtx10ix10i_ax0     |      3 | 266.872327 |
| indexing__panel_arr_pos_100000dtx10ix10i         |      3 | 266.904354 |
| indexing__panel_arr_pos_10ix10ix100000dt_ax2     |      3 | 288.369656 |
| indexing__panel_basic_label_100000dtx10ix10i_ax2 |      3 | 405.912108 |
| indexing__panel_basic_pos_10ix10ix100000dt_ax0   |      3 | 406.374137 |
| indexing__panel_basic_label_10ix10ix100000dt_ax0 |      3 | 409.108559 |
| indexing__panel_basic_pos_100000dtx10ix10i_ax2   |      3 | 412.231657 |

Observe, how timings for of datetime-axis (dt) panel indexing are 2-3x slower than those of multiindex-axis (mi) ones. The result needs additional verification, but still is interesting.

immerrr · 2014-03-11T10:24:29Z

Quick follow-up: what's even funnier is that the slowest benchmarks are "basic" indexing, i.e. the index is a single value and the operation involves dimension reduction.

jreback · 2014-03-11T11:40:35Z

thnks...obviously could have some more optimization :)

jreback · 2014-03-11T12:49:02Z

can you put a bullet and explanation about how-to-vbench on https://github.com/pydata/pandas/wiki in the Developers section? thanks

jreback · 2014-04-09T13:27:15Z

@immerrr whats the status of this? what is the time for running the vbenches with new vs old?

immerrr · 2014-04-10T08:50:28Z

No update, really, #6745 has been keeping me busy, I'm trying to make it available before 0.14 release.

AFAIR, I was going to keep this benchmark out of the "default" run configuration and add a cmdline parameter to make sure it's easy to enable. There were some issues I wanted to address in the benchmark runner itself, but I need to delve back into the scope of this PR.

jreback · 2014-04-10T10:32:50Z

no problem. ok'll moving this issue to 0.15, but feel free update if ready for 0.14 (since its for vbenches version doesnt make much difference)

… Panel

…lude that module

jreback · 2015-01-18T21:41:12Z

@immerrr if you'd like to update, I think this would be nice. pls reopen if you do that.

jreback added Performance labels Feb 25, 2014

jreback added this to the 0.14.0 milestone Feb 25, 2014

jreback mentioned this pull request Feb 26, 2014

Serious performance regression in DataFrame construction with monthly DatetimeIndex #6479

Closed

jreback reviewed Feb 28, 2014
View reviewed changes

jreback modified the milestones: 0.15.0, 0.14.0 Apr 10, 2014

immerrr mentioned this pull request Apr 15, 2014

PERF: Possible performance regression for indexing from 0.12 to 0.13.1 #6882

Closed

immerrr added 5 commits April 30, 2014 22:08

BLD: add benchmarks for all loc/iloc indexers for Series, DataFrame &…

d6b98f3

… Panel

BLD: don't contract stats output if it's longer than 60 rows

8d17f50

BLD: fix indexing__* benchmarks to work in comparative mode

bb148fc

BLD: move exhaustive benchmarks to separate module, add option to inc…

af56899

…lude that module

vb_suite/extras_indexing.py: add axis types (str, int, datetime, etc)

001fcac

jreback closed this Jan 18, 2015

BLD: add benchmarks for all single-axis indexers #6450

BLD: add benchmarks for all single-axis indexers #6450

Conversation

immerrr commented Feb 23, 2014

jreback commented Feb 23, 2014

immerrr commented Feb 24, 2014

immerrr commented Feb 25, 2014

jreback commented Feb 25, 2014

immerrr commented Feb 26, 2014

jreback commented Feb 26, 2014

immerrr commented Feb 27, 2014

jreback commented Feb 27, 2014

jreback commented Feb 27, 2014

immerrr commented Feb 27, 2014

jreback commented Feb 27, 2014

immerrr commented Feb 27, 2014

jreback commented Feb 27, 2014

immerrr commented Feb 27, 2014

immerrr commented Feb 27, 2014

jreback commented Feb 27, 2014

immerrr commented Feb 28, 2014

jreback commented Feb 28, 2014

immerrr commented Feb 28, 2014

jreback commented Feb 28, 2014

immerrr commented Feb 28, 2014

immerrr commented Feb 28, 2014

jreback Feb 28, 2014

Choose a reason for hiding this comment

immerrr Feb 28, 2014

Choose a reason for hiding this comment

jreback Feb 28, 2014

Choose a reason for hiding this comment

jreback Feb 28, 2014

Choose a reason for hiding this comment

cpcloud Feb 28, 2014

Choose a reason for hiding this comment

cpcloud Feb 28, 2014

Choose a reason for hiding this comment

immerrr Mar 1, 2014

Choose a reason for hiding this comment

jreback commented Mar 9, 2014

immerrr commented Mar 11, 2014

immerrr commented Mar 11, 2014

jreback commented Mar 11, 2014

jreback commented Mar 11, 2014

jreback commented Apr 9, 2014

immerrr commented Apr 10, 2014

jreback commented Apr 10, 2014

jreback commented Jan 18, 2015