Skip to content

BLD: add benchmarks for all single-axis indexers #6450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

immerrr
Copy link
Contributor

@immerrr immerrr commented Feb 23, 2014

This PR will add benchmarks for all available single-axis container indexing patterns discussed in #6328.

It is currently in WIP state, but the ideas and comments are welcome. Multi-axis indexers are left for future to avoid overcomplicating things from the beginning.

Part of functionality implemented in this PR is caching tm.makeCustomIndex results to avoid spending several seconds in setup of each benchmark. It may deserve a separate PR to implement this caching within pandas.util.testing module.

TODO:

  • axis != 0 benchmarks
  • container shape specification
  • benchmarking obj.__getitem__
  • benchmarking obj.xs
  • multiindex axes
  • non-unique axes
  • non-string index benchmarks

@jreback
Copy link
Contributor

jreback commented Feb 23, 2014

can u put a sample run of this up

@immerrr
Copy link
Contributor Author

immerrr commented Feb 24, 2014

Here you are

@immerrr
Copy link
Contributor Author

immerrr commented Feb 25, 2014

Implemented container shape specification (so that Panel & Panel4D can be also benchmarked) and updated the gist with results.

There are some suspicious results, though, e.g.

indexing__series_arr_pos_1000000s            | 206.7204 |

The figure is suspicious because when I run it separately like indexing.indexing__series_arr_pos_1000000s.run() it takes 20ms, not 200.

@jreback
Copy link
Contributor

jreback commented Feb 25, 2014

I would change this script to something like generate_vbenches.py that then actually creates indexing2.py by generating the code. The reason is that then say you want to repeat a specific benchmark it easy to copy/paste. (just put a warning at the top of indexing2.py to say run python generate_vbenches.py instead of editing indexing2.py (you can hardcode it to overwrite the output file).

@immerrr
Copy link
Contributor Author

immerrr commented Feb 26, 2014

That might add a lot of complexity to the vbench suite (which is already complex beyond my liking). I'm not sure I understand what is wrong with reproducing the benchmarks by something like

In [1]: cd vb_suite/
/home/immerrr/sources/pandas/vb_suite

In [2]: import indexing

In [3]: indexing.
indexing.Benchmark                                         indexing.indexing__panel4d_slice_pos_10000sx20sx20sx20s
indexing.MAX_ENTRIES                                       indexing.indexing__panel_arr_label_100000sx20sx20s
indexing.SECTION                                           indexing.indexing__panel_arr_pos_100000sx20sx20s
indexing.bm_df_getitem                                     indexing.indexing__panel_basic_label_100000sx20sx20s
indexing.bm_df_getitem2                                    indexing.indexing__panel_basic_pos_100000sx20sx20s
indexing.bm_df_getitem3                                    indexing.indexing__panel_iloc_mask_100000sx20sx20s
indexing.bm_getitem                                        indexing.indexing__panel_loc_mask_100000sx20sx20s
indexing.common_setup                                      indexing.indexing__panel_slice_label_100000sx20sx20s
indexing.datetime                                          indexing.indexing__panel_slice_pos_100000sx20sx20s
indexing.frame_iloc_big                                    indexing.indexing__series_arr_label_1000000s
indexing.frame_iloc_dups                                   indexing.indexing__series_arr_pos_1000000s
indexing.frame_loc_dups                                    indexing.indexing__series_basic_label_1000000s
indexing.frame_xs_mi_ix                                    indexing.indexing__series_basic_pos_1000000s
indexing.generate_index_benchmarks                         indexing.indexing__series_iloc_mask_1000000s
indexing.get_index                                         indexing.indexing__series_loc_mask_1000000s
indexing.indexing__dataframe_arr_label_100000sx100s        indexing.indexing__series_slice_label_1000000s
indexing.indexing__dataframe_arr_label_100sx100000s        indexing.indexing__series_slice_pos_1000000s
indexing.indexing__dataframe_arr_pos_100000sx100s          indexing.indexing_dataframe_boolean
indexing.indexing__dataframe_arr_pos_100sx100000s          indexing.indexing_dataframe_boolean_no_ne
indexing.indexing__dataframe_basic_label_100000sx100s      indexing.indexing_dataframe_boolean_rows
indexing.indexing__dataframe_basic_label_100sx100000s      indexing.indexing_dataframe_boolean_rows_object
indexing.indexing__dataframe_basic_pos_100000sx100s        indexing.indexing_dataframe_boolean_st
indexing.indexing__dataframe_basic_pos_100sx100000s        indexing.indexing_frame_get_value
indexing.indexing__dataframe_iloc_mask_100000sx100s        indexing.indexing_frame_get_value_ix
indexing.indexing__dataframe_iloc_mask_100sx100000s        indexing.indexing_panel_subset
indexing.indexing__dataframe_loc_mask_100000sx100s         indexing.mask
indexing.indexing__dataframe_loc_mask_100sx100000s         indexing.py
indexing.indexing__dataframe_slice_label_100000sx100s      indexing.pyc
indexing.indexing__dataframe_slice_label_100sx100000s      indexing.series_getitem_label_slice
indexing.indexing__dataframe_slice_pos_100000sx100s        indexing.series_getitem_pos_slice
indexing.indexing__dataframe_slice_pos_100sx100000s        indexing.series_mask
indexing.indexing__panel4d_arr_label_10000sx20sx20sx20s    indexing.series_xs_mi_ix
indexing.indexing__panel4d_arr_pos_10000sx20sx20sx20s      indexing.setup
indexing.indexing__panel4d_basic_label_10000sx20sx20sx20s  indexing.setup_template
indexing.indexing__panel4d_basic_pos_10000sx20sx20sx20s    indexing.sort_level_one
indexing.indexing__panel4d_iloc_mask_10000sx20sx20sx20s    indexing.sort_level_zero
indexing.indexing__panel4d_loc_mask_10000sx20sx20sx20s     indexing.statement
indexing.indexing__panel4d_slice_label_10000sx20sx20sx20s  indexing.tm

In [3]: indexing.indexing__data
indexing.indexing__dataframe_arr_label_100000sx100s    indexing.indexing__dataframe_iloc_mask_100000sx100s
indexing.indexing__dataframe_arr_label_100sx100000s    indexing.indexing__dataframe_iloc_mask_100sx100000s
indexing.indexing__dataframe_arr_pos_100000sx100s      indexing.indexing__dataframe_loc_mask_100000sx100s
indexing.indexing__dataframe_arr_pos_100sx100000s      indexing.indexing__dataframe_loc_mask_100sx100000s
indexing.indexing__dataframe_basic_label_100000sx100s  indexing.indexing__dataframe_slice_label_100000sx100s
indexing.indexing__dataframe_basic_label_100sx100000s  indexing.indexing__dataframe_slice_label_100sx100000s
indexing.indexing__dataframe_basic_pos_100000sx100s    indexing.indexing__dataframe_slice_pos_100000sx100s
indexing.indexing__dataframe_basic_pos_100sx100000s    indexing.indexing__dataframe_slice_pos_100sx100000s

In [3]: indexing.indexing__dataframe_loc
indexing.indexing__dataframe_loc_mask_100000sx100s  indexing.indexing__dataframe_loc_mask_100sx100000s

In [3]: indexing.indexing__dataframe_loc_mask_100
indexing.indexing__dataframe_loc_mask_100000sx100s  indexing.indexing__dataframe_loc_mask_100sx100000s

In [3]: indexing.indexing__dataframe_loc_mask_100sx100000s.run()
Out[3]: 
{'loops': 1000,
 'repeat': 3,
 'succeeded': True,
 'timing': 0.18867802619934082,
 'units': 'ms'}

@jreback
Copy link
Contributor

jreback commented Feb 26, 2014

ok

I would drop panel4d - they are identical to panel for most operations that matter

so these are all 0th axis ; nice to have the ndim-1 axis as well

@immerrr
Copy link
Contributor Author

immerrr commented Feb 27, 2014

Split the benchmark into no-axis (obj[indexer]), axis=1(obj[indexer,:]) and axis=ndim-1 (obj[:,indexer]) cases.

Even with Panel4D benchmarks removed it's now 112 tests (unique string axes, Series, 2 shapes of DataFrame, 2 shapes of Panel). Adding non-unique indices will double that, so will adding setitem benchmarks, let alone different index types and multi-indexing.

It's not that I'm against adding this many benchmarks, but it would be nice to draw a line somewhere in internal API between axis label lookup operations and numpy-like indexing/slicing of block managers. This would allow testing/benchmarking them separately and avoid further combinatorial test count increase.

Also, the results are still weird...

@jreback
Copy link
Contributor

jreback commented Feb 27, 2014

how much extra time is the total run taking?

in theory also should condition onindex type (e.g. FloatIndex, DatetimeIndex)

@jreback
Copy link
Contributor

jreback commented Feb 27, 2014

maybe need to be split into fast / full versions (e.g. fast runs by default, full on-deman)

@immerrr
Copy link
Contributor Author

immerrr commented Feb 27, 2014

As it is right now, it's not too long (under 1min).

But, like I said, it looks like something's wrong. E.g. given that the panel used in benchmark contains about 1M * 20 * 20 == 400M doubles, the benchmark should consume 3.2G+ of RAM at some point, but high-water mark of the process never exceeds several hundred megabytes on my desktop. It may be some kernel paging magic that happens because the allocated memory is never used, but I need to double check that before drawing any conclusions about run time.

@jreback
Copy link
Contributor

jreback commented Feb 27, 2014

ok.....that's too much anyhow....you canprob make the index smaller....say 100k

@immerrr
Copy link
Contributor Author

immerrr commented Feb 27, 2014

Ok, fixed the bug and the results now look much better.

After downscaling, the test run takes 9mins on my Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz.

@jreback
Copy link
Contributor

jreback commented Feb 27, 2014

what takes 9min total perf run? (including compile time?)

@immerrr
Copy link
Contributor Author

immerrr commented Feb 27, 2014

maybe need to be split into fast / full versions (e.g. fast runs by default, full on-deman)

Having in mind #6479 which may result in adding as many benchmarks as this PR, I thought maybe test_perf.py could use a parameter like run benchmarks from this file only, so that all long-running exhaustive benchmarks can be separated from one another and avoid making long runs unnecessarily long.

@immerrr
Copy link
Contributor Author

immerrr commented Feb 27, 2014

what takes 9min total perf run? (including compile time?)

The following command did:

$ time ./test_perf.sh -H -N3 -S -u2 -r indexing__

<snip>

real    8m37.888s
user    8m15.132s
sys 0m21.824s

So, 9 minutes don't include compilation, only executing those 112 benchmarks.

@jreback
Copy link
Contributor

jreback commented Feb 27, 2014

Having in mind #6479 which may result in adding as many benchmarks as this PR, I thought maybe test_perf.py could use a parameter like run benchmarks from this file only, so that all long-running exhaustive benchmarks can be separated from one another and avoid making long runs unnecessarily long

yes...don't normally want to include all of these

FYI what does it take if you don't put the -r (e.g. the total suite takes what) for comparison

@immerrr
Copy link
Contributor Author

immerrr commented Feb 28, 2014

FYI what does it take if you don't put the -r (e.g. the total suite takes what) for comparison

It's about 20mins with the same configuration (-H -N3 -S -u2).

@jreback
Copy link
Contributor

jreback commented Feb 28, 2014

hmm so adding these 112 benches makes it basically double in time

I would reduce the index size to 100k - see what happens

@immerrr
Copy link
Contributor Author

immerrr commented Feb 28, 2014

I would reduce the index size to 100k - see what happens

The discussed figures are for 100k sizes, actually :) That happened after immerrr@7f0574f.

@jreback
Copy link
Contributor

jreback commented Feb 28, 2014

ahh you are repeating 3x ok then

prob ok for now
but let's think about creating a fast/slow argument anyhow
(default being fast)

@immerrr
Copy link
Contributor Author

immerrr commented Feb 28, 2014

Bad news: trick with caching of makeCustomIndex doesn't work during comparative benchmarks. It becomes a habit with pandas to go deeper to make things work...

@immerrr
Copy link
Contributor Author

immerrr commented Feb 28, 2014

Bad news: trick with caching of makeCustomIndex doesn't work during comparative benchmarks.

Ok, fixed that.

I've benchmarked #6440 with this new suite, here's the result.

@@ -78,7 +78,7 @@
name='eval_frame_and_one_thread',
start_date=datetime(2013, 7, 26))

setup = common_setup
# setup = common_setup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what did this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch, that was a temporary fix.

Without it eval_frame_and_python and eval_frame_and_one_thread have same setup and thus share checksum which results in non-unique index error when benchmarks.db is being regenerated from scratch. Same goes for L105.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm...maybe just 'copy' common_setup then?

t

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry didn't see this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the checksum only dependent on that string? I don't know how it's computed, but that seems strange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's dependent on setup and benchmarked code. For the benchmarks under
investigation setup differed from common_setup by single thread setup code
and if one removed it , the single/multi thread benches became identical
and caused db conflict.
On Feb 28, 2014 10:17 PM, "Phillip Cloud" [email protected] wrote:

In vb_suite/eval.py:

@@ -78,7 +78,7 @@
name='eval_frame_and_one_thread',
start_date=datetime(2013, 7, 26))

-setup = common_setup
+# setup = common_setup

Is the checksum only dependent on that string? I don't know how it's
computed, but that seems strange.


Reply to this email directly or view it on GitHub.

@jreback
Copy link
Contributor

jreback commented Mar 9, 2014

hows this coming?

@immerrr
Copy link
Contributor Author

immerrr commented Mar 11, 2014

I've added benchmarks for non-string axes. Not everything works yet, but there are interesting outliers in updated results:

| indexing__panel_arr_label_10ix10ix100000dt_ax2   |      3 | 108.405881 |
| indexing__dataframe_arr_label_10ix100000mi_ax1   |      3 | 110.820320 |
| indexing__panel_arr_label_100000mix10ix10i       |      3 | 119.547314 |
| indexing__panel_arr_pos_100000mix10ix10i         |      3 | 119.803111 |
| indexing__panel_arr_pos_100000mix10ix10i_ax0     |      3 | 120.717234 |
| indexing__panel_arr_label_10ix10ix100000mi_ax2   |      3 | 137.265232 |
| indexing__panel_arr_label_100000mix10ix10i_ax0   |      3 | 155.488438 |
| indexing__panel_arr_pos_100000dtx10ix10i_ax0     |      3 | 266.872327 |
| indexing__panel_arr_pos_100000dtx10ix10i         |      3 | 266.904354 |
| indexing__panel_arr_pos_10ix10ix100000dt_ax2     |      3 | 288.369656 |
| indexing__panel_basic_label_100000dtx10ix10i_ax2 |      3 | 405.912108 |
| indexing__panel_basic_pos_10ix10ix100000dt_ax0   |      3 | 406.374137 |
| indexing__panel_basic_label_10ix10ix100000dt_ax0 |      3 | 409.108559 |
| indexing__panel_basic_pos_100000dtx10ix10i_ax2   |      3 | 412.231657 |

Observe, how timings for of datetime-axis (dt) panel indexing are 2-3x slower than those of multiindex-axis (mi) ones. The result needs additional verification, but still is interesting.

@immerrr
Copy link
Contributor Author

immerrr commented Mar 11, 2014

Quick follow-up: what's even funnier is that the slowest benchmarks are "basic" indexing, i.e. the index is a single value and the operation involves dimension reduction.

@jreback
Copy link
Contributor

jreback commented Mar 11, 2014

thnks...obviously could have some more optimization :)

@jreback
Copy link
Contributor

jreback commented Mar 11, 2014

can you put a bullet and explanation about how-to-vbench on https://github.com/pydata/pandas/wiki in the Developers section? thanks

@jreback
Copy link
Contributor

jreback commented Apr 9, 2014

@immerrr whats the status of this? what is the time for running the vbenches with new vs old?

@immerrr
Copy link
Contributor Author

immerrr commented Apr 10, 2014

No update, really, #6745 has been keeping me busy, I'm trying to make it available before 0.14 release.

AFAIR, I was going to keep this benchmark out of the "default" run configuration and add a cmdline parameter to make sure it's easy to enable. There were some issues I wanted to address in the benchmark runner itself, but I need to delve back into the scope of this PR.

@jreback
Copy link
Contributor

jreback commented Apr 10, 2014

no problem. ok'll moving this issue to 0.15, but feel free update if ready for 0.14 (since its for vbenches version doesnt make much difference)

@jreback
Copy link
Contributor

jreback commented Jan 18, 2015

@immerrr if you'd like to update, I think this would be nice. pls reopen if you do that.

@jreback jreback closed this Jan 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants