-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BLD: add benchmarks for all single-axis indexers #6450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
can u put a sample run of this up |
Here you are |
Implemented container shape specification (so that Panel & Panel4D can be also benchmarked) and updated the gist with results. There are some suspicious results, though, e.g.
The figure is suspicious because when I run it separately like |
I would change this script to something like |
That might add a lot of complexity to the vbench suite (which is already complex beyond my liking). I'm not sure I understand what is wrong with reproducing the benchmarks by something like In [1]: cd vb_suite/
/home/immerrr/sources/pandas/vb_suite
In [2]: import indexing
In [3]: indexing.
indexing.Benchmark indexing.indexing__panel4d_slice_pos_10000sx20sx20sx20s
indexing.MAX_ENTRIES indexing.indexing__panel_arr_label_100000sx20sx20s
indexing.SECTION indexing.indexing__panel_arr_pos_100000sx20sx20s
indexing.bm_df_getitem indexing.indexing__panel_basic_label_100000sx20sx20s
indexing.bm_df_getitem2 indexing.indexing__panel_basic_pos_100000sx20sx20s
indexing.bm_df_getitem3 indexing.indexing__panel_iloc_mask_100000sx20sx20s
indexing.bm_getitem indexing.indexing__panel_loc_mask_100000sx20sx20s
indexing.common_setup indexing.indexing__panel_slice_label_100000sx20sx20s
indexing.datetime indexing.indexing__panel_slice_pos_100000sx20sx20s
indexing.frame_iloc_big indexing.indexing__series_arr_label_1000000s
indexing.frame_iloc_dups indexing.indexing__series_arr_pos_1000000s
indexing.frame_loc_dups indexing.indexing__series_basic_label_1000000s
indexing.frame_xs_mi_ix indexing.indexing__series_basic_pos_1000000s
indexing.generate_index_benchmarks indexing.indexing__series_iloc_mask_1000000s
indexing.get_index indexing.indexing__series_loc_mask_1000000s
indexing.indexing__dataframe_arr_label_100000sx100s indexing.indexing__series_slice_label_1000000s
indexing.indexing__dataframe_arr_label_100sx100000s indexing.indexing__series_slice_pos_1000000s
indexing.indexing__dataframe_arr_pos_100000sx100s indexing.indexing_dataframe_boolean
indexing.indexing__dataframe_arr_pos_100sx100000s indexing.indexing_dataframe_boolean_no_ne
indexing.indexing__dataframe_basic_label_100000sx100s indexing.indexing_dataframe_boolean_rows
indexing.indexing__dataframe_basic_label_100sx100000s indexing.indexing_dataframe_boolean_rows_object
indexing.indexing__dataframe_basic_pos_100000sx100s indexing.indexing_dataframe_boolean_st
indexing.indexing__dataframe_basic_pos_100sx100000s indexing.indexing_frame_get_value
indexing.indexing__dataframe_iloc_mask_100000sx100s indexing.indexing_frame_get_value_ix
indexing.indexing__dataframe_iloc_mask_100sx100000s indexing.indexing_panel_subset
indexing.indexing__dataframe_loc_mask_100000sx100s indexing.mask
indexing.indexing__dataframe_loc_mask_100sx100000s indexing.py
indexing.indexing__dataframe_slice_label_100000sx100s indexing.pyc
indexing.indexing__dataframe_slice_label_100sx100000s indexing.series_getitem_label_slice
indexing.indexing__dataframe_slice_pos_100000sx100s indexing.series_getitem_pos_slice
indexing.indexing__dataframe_slice_pos_100sx100000s indexing.series_mask
indexing.indexing__panel4d_arr_label_10000sx20sx20sx20s indexing.series_xs_mi_ix
indexing.indexing__panel4d_arr_pos_10000sx20sx20sx20s indexing.setup
indexing.indexing__panel4d_basic_label_10000sx20sx20sx20s indexing.setup_template
indexing.indexing__panel4d_basic_pos_10000sx20sx20sx20s indexing.sort_level_one
indexing.indexing__panel4d_iloc_mask_10000sx20sx20sx20s indexing.sort_level_zero
indexing.indexing__panel4d_loc_mask_10000sx20sx20sx20s indexing.statement
indexing.indexing__panel4d_slice_label_10000sx20sx20sx20s indexing.tm
In [3]: indexing.indexing__data
indexing.indexing__dataframe_arr_label_100000sx100s indexing.indexing__dataframe_iloc_mask_100000sx100s
indexing.indexing__dataframe_arr_label_100sx100000s indexing.indexing__dataframe_iloc_mask_100sx100000s
indexing.indexing__dataframe_arr_pos_100000sx100s indexing.indexing__dataframe_loc_mask_100000sx100s
indexing.indexing__dataframe_arr_pos_100sx100000s indexing.indexing__dataframe_loc_mask_100sx100000s
indexing.indexing__dataframe_basic_label_100000sx100s indexing.indexing__dataframe_slice_label_100000sx100s
indexing.indexing__dataframe_basic_label_100sx100000s indexing.indexing__dataframe_slice_label_100sx100000s
indexing.indexing__dataframe_basic_pos_100000sx100s indexing.indexing__dataframe_slice_pos_100000sx100s
indexing.indexing__dataframe_basic_pos_100sx100000s indexing.indexing__dataframe_slice_pos_100sx100000s
In [3]: indexing.indexing__dataframe_loc
indexing.indexing__dataframe_loc_mask_100000sx100s indexing.indexing__dataframe_loc_mask_100sx100000s
In [3]: indexing.indexing__dataframe_loc_mask_100
indexing.indexing__dataframe_loc_mask_100000sx100s indexing.indexing__dataframe_loc_mask_100sx100000s
In [3]: indexing.indexing__dataframe_loc_mask_100sx100000s.run()
Out[3]:
{'loops': 1000,
'repeat': 3,
'succeeded': True,
'timing': 0.18867802619934082,
'units': 'ms'} |
ok I would drop panel4d - they are identical to panel for most operations that matter so these are all 0th axis ; nice to have the ndim-1 axis as well |
Split the benchmark into no-axis ( Even with Panel4D benchmarks removed it's now 112 tests (unique string axes, Series, 2 shapes of DataFrame, 2 shapes of Panel). Adding non-unique indices will double that, so will adding setitem benchmarks, let alone different index types and multi-indexing. It's not that I'm against adding this many benchmarks, but it would be nice to draw a line somewhere in internal API between axis label lookup operations and numpy-like indexing/slicing of block managers. This would allow testing/benchmarking them separately and avoid further combinatorial test count increase. Also, the results are still weird... |
how much extra time is the total run taking? in theory also should condition onindex type (e.g. FloatIndex, DatetimeIndex) |
maybe need to be split into fast / full versions (e.g. fast runs by default, full on-deman) |
As it is right now, it's not too long (under 1min). But, like I said, it looks like something's wrong. E.g. given that the panel used in benchmark contains about 1M * 20 * 20 == 400M doubles, the benchmark should consume 3.2G+ of RAM at some point, but high-water mark of the process never exceeds several hundred megabytes on my desktop. It may be some kernel paging magic that happens because the allocated memory is never used, but I need to double check that before drawing any conclusions about run time. |
ok.....that's too much anyhow....you canprob make the index smaller....say 100k |
Ok, fixed the bug and the results now look much better. After downscaling, the test run takes 9mins on my |
what takes 9min total perf run? (including compile time?) |
Having in mind #6479 which may result in adding as many benchmarks as this PR, I thought maybe test_perf.py could use a parameter like run benchmarks from this file only, so that all long-running exhaustive benchmarks can be separated from one another and avoid making long runs unnecessarily long. |
The following command did:
So, 9 minutes don't include compilation, only executing those 112 benchmarks. |
yes...don't normally want to include all of these FYI what does it take if you don't put the -r (e.g. the total suite takes what) for comparison |
It's about 20mins with the same configuration ( |
hmm so adding these 112 benches makes it basically double in time I would reduce the index size to 100k - see what happens |
The discussed figures are for 100k sizes, actually :) That happened after immerrr@7f0574f. |
ahh you are repeating 3x ok then prob ok for now |
Bad news: trick with caching of |
@@ -78,7 +78,7 @@ | |||
name='eval_frame_and_one_thread', | |||
start_date=datetime(2013, 7, 26)) | |||
|
|||
setup = common_setup | |||
# setup = common_setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what did this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ouch, that was a temporary fix.
Without it eval_frame_and_python
and eval_frame_and_one_thread
have same setup
and thus share checksum
which results in non-unique index error when benchmarks.db
is being regenerated from scratch. Same goes for L105.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm...maybe just 'copy' common_setup then?
t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpcloud ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry didn't see this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the checksum only dependent on that string? I don't know how it's computed, but that seems strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's dependent on setup and benchmarked code. For the benchmarks under
investigation setup differed from common_setup by single thread setup code
and if one removed it , the single/multi thread benches became identical
and caused db conflict.
On Feb 28, 2014 10:17 PM, "Phillip Cloud" [email protected] wrote:
In vb_suite/eval.py:
@@ -78,7 +78,7 @@
name='eval_frame_and_one_thread',
start_date=datetime(2013, 7, 26))-setup = common_setup
+# setup = common_setupIs the checksum only dependent on that string? I don't know how it's
computed, but that seems strange.—
Reply to this email directly or view it on GitHub.
hows this coming? |
I've added benchmarks for non-string axes. Not everything works yet, but there are interesting outliers in updated results:
Observe, how timings for of datetime-axis (dt) panel indexing are 2-3x slower than those of multiindex-axis (mi) ones. The result needs additional verification, but still is interesting. |
Quick follow-up: what's even funnier is that the slowest benchmarks are "basic" indexing, i.e. the index is a single value and the operation involves dimension reduction. |
thnks...obviously could have some more optimization :) |
can you put a bullet and explanation about how-to-vbench on https://github.com/pydata/pandas/wiki in the Developers section? thanks |
@immerrr whats the status of this? what is the time for running the vbenches with new vs old? |
No update, really, #6745 has been keeping me busy, I'm trying to make it available before 0.14 release. AFAIR, I was going to keep this benchmark out of the "default" run configuration and add a cmdline parameter to make sure it's easy to enable. There were some issues I wanted to address in the benchmark runner itself, but I need to delve back into the scope of this PR. |
no problem. ok'll moving this issue to 0.15, but feel free update if ready for 0.14 (since its for vbenches version doesnt make much difference) |
@immerrr if you'd like to update, I think this would be nice. pls reopen if you do that. |
This PR will add benchmarks for all available single-axis container indexing patterns discussed in #6328.
It is currently in WIP state, but the ideas and comments are welcome. Multi-axis indexers are left for future to avoid overcomplicating things from the beginning.
Part of functionality implemented in this PR is caching
tm.makeCustomIndex
results to avoid spending several seconds in setup of each benchmark. It may deserve a separate PR to implement this caching withinpandas.util.testing
module.TODO:
obj.__getitem__
obj.xs