Skip to content

PERF: Benchmark merge with non-int64 and tolerance (#28922) #28974

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 22, 2019

Conversation

jjlkant
Copy link
Contributor

@jjlkant jjlkant commented Oct 14, 2019

@jjlkant
Copy link
Contributor Author

jjlkant commented Oct 14, 2019

Currently implements a parameterized tolerance benchmark using tolerance=None and tolerance=5000. I've run the ASV using different values (5, 100, 5000, 10000) without noticable changes in the results. Therefore I've opted for the current arbitrary value of 5000.

Results of these benchmarks:
tolerance=5:

· Discovering benchmarks
· Running 6 total benchmarks (1 commits * 1 environments * 6 benchmarks)
[ 0.00%] ·· Benchmarking existing-pyc__users_jjlkant_anaconda3_envs_pandas-dev_python.exe
[ 8.33%] ··· Running (join_merge.MergeAsof.time_by_int--).
[ 16.67%] ··· Running (join_merge.MergeAsof.time_by_object--).
[ 25.00%] ··· Running (join_merge.MergeAsof.time_multiby--).
[ 33.33%] ··· Running (join_merge.MergeAsof.time_on_int--)..
[ 50.00%] ··· Running (join_merge.MergeAsof.time_on_uint64--).
[ 58.33%] ··· join_merge.MergeAsof.time_by_int ok
[ 58.33%] ··· =========== ========== ==========
-- tolerance
----------- ---------------------
direction None 5
=========== ========== ==========
backward 96.4±9ms 100±20ms
forward 192±10ms 170±10ms
nearest 219±10ms 266±30ms
=========== ========== ==========

[ 66.67%] ··· join_merge.MergeAsof.time_by_object ok
[ 66.67%] ··· =========== ========== ==========
-- tolerance
----------- ---------------------
direction None 5
=========== ========== ==========
backward 95.8±5ms 104±10ms
forward 177±30ms 201±50ms
nearest 288±30ms 280±30ms
=========== ========== ==========

[ 75.00%] ··· join_merge.MergeAsof.time_multiby ok
[ 75.00%] ··· =========== ========== ==========
-- tolerance
----------- ---------------------
direction None 5
=========== ========== ==========
backward 706±30ms 663±60ms
forward 769±20ms 735±5ms
nearest 822±10ms 823±10ms
=========== ========== ==========

[ 83.33%] ··· join_merge.MergeAsof.time_on_int ok
[ 83.33%] ··· =========== ========== ===========
-- tolerance
----------- ----------------------
direction None 5
=========== ========== ===========
backward 29.3±3ms 29.5±1ms
forward 35.4±5ms 34.5±6ms
nearest 47.8±7ms 46.0±10ms
=========== ========== ===========

[ 91.67%] ··· join_merge.MergeAsof.time_on_int32 ok
[ 91.67%] ··· =========== ========== ==========
-- tolerance
----------- ---------------------
direction None 5
=========== ========== ==========
backward 35.7±6ms 35.0±4ms
forward 41.2±9ms 35.3±3ms
nearest 42.9±6ms 51.3±5ms
=========== ========== ==========

[100.00%] ··· join_merge.MergeAsof.time_on_uint64 ok
[100.00%] ··· =========== =========== ==========
-- tolerance
----------- ----------------------
direction None 5
=========== =========== ==========
backward 35.5±7ms 36.5±5ms
forward 45.0±3ms 40.4±8ms
nearest 44.1±10ms 51.9±7ms
=========== =========== ==========

tolerance=100 and tolerance=10000:

· Discovering benchmarks
· Running 6 total benchmarks (1 commits * 1 environments * 6 benchmarks)
[ 0.00%] ·· Benchmarking existing-pyc__users_jjlkant_anaconda3_envs_pandas-dev_python.exe
[ 8.33%] ··· Running (join_merge.MergeAsof.time_by_int--).
[ 16.67%] ··· Running (join_merge.MergeAsof.time_by_object--).
[ 25.00%] ··· Running (join_merge.MergeAsof.time_multiby--).
[ 33.33%] ··· Running (join_merge.MergeAsof.time_on_int--).
[ 41.67%] ··· Running (join_merge.MergeAsof.time_on_int32--).
[ 50.00%] ··· Running (join_merge.MergeAsof.time_on_uint64--).
[ 58.33%] ··· join_merge.MergeAsof.time_by_int ok
[ 58.33%] ··· =========== =========== ========== ==========
-- tolerance
----------- ---------------------------------
direction None 100 10000
=========== =========== ========== ==========
backward 99.9±30ms 108±10ms 102±6ms
forward 163±30ms 171±10ms 173±20ms
nearest 238±10ms 236±20ms 233±10ms
=========== =========== ========== ==========

[ 66.67%] ··· join_merge.MergeAsof.time_by_object ok
[ 66.67%] ··· =========== ========== ========== ==========
-- tolerance
----------- --------------------------------
direction None 100 10000
=========== ========== ========== ==========
backward 105±10ms 94.8±7ms 95.8±4ms
forward 188±8ms 172±9ms 191±30ms
nearest 238±10ms 246±8ms 242±20ms
=========== ========== ========== ==========

[ 75.00%] ··· join_merge.MergeAsof.time_multiby ok
[ 75.00%] ··· =========== ========== ========== ==========
-- tolerance
----------- --------------------------------
direction None 100 10000
=========== ========== ========== ==========
backward 653±20ms 661±20ms 662±10ms
forward 751±10ms 763±20ms 775±8ms
nearest 849±30ms 851±50ms 861±30ms
=========== ========== ========== ==========

[ 83.33%] ··· join_merge.MergeAsof.time_on_int ok
[ 83.33%] ··· =========== ========== ========== ==========
-- tolerance
----------- --------------------------------
direction None 100 10000
=========== ========== ========== ==========
backward 32.0±4ms 31.5±5ms 30.3±2ms
forward 34.1±4ms 35.1±6ms 32.9±1ms
nearest 41.7±6ms 43.4±6ms 43.3±3ms
=========== ========== ========== ==========

[ 91.67%] ··· join_merge.MergeAsof.time_on_int32 ok
[ 91.67%] ··· =========== ========== ========== ==========
-- tolerance
----------- --------------------------------
direction None 100 10000
=========== ========== ========== ==========
backward 35.1±3ms 36.2±3ms 40.2±4ms
forward 34.5±4ms 36.4±1ms 34.8±3ms
nearest 39.5±3ms 39.5±4ms 49.3±8ms
=========== ========== ========== ==========

[100.00%] ··· join_merge.MergeAsof.time_on_uint64 ok
[100.00%] ··· =========== ============ ========== ===========
-- tolerance
----------- -----------------------------------
direction None 100 10000
=========== ============ ========== ===========
backward 30.4±0.8ms 32.3±5ms 28.6±4ms
forward 31.7±4ms 33.0±2ms 34.8±4ms
nearest 39.8±4ms 44.6±4ms 48.1±10ms
=========== ============ ========== ===========

tolerance=5000:

· Discovering benchmarks
· Running 6 total benchmarks (1 commits * 1 environments * 6 benchmarks)
[ 0.00%] ·· Benchmarking existing-pyc__users_jjlkant_anaconda3_envs_pandas-dev_python.exe
[ 8.33%] ··· Running (join_merge.MergeAsof.time_by_int--)..
[ 25.00%] ··· Running (join_merge.MergeAsof.time_multiby--).
[ 33.33%] ··· Running (join_merge.MergeAsof.time_on_int--)..
[ 50.00%] ··· Running (join_merge.MergeAsof.time_on_uint64--).
[ 58.33%] ··· join_merge.MergeAsof.time_by_int ok
[ 58.33%] ··· =========== ========== ==========
-- tolerance
----------- ---------------------
direction None 5000
=========== ========== ==========
backward 104±20ms 111±20ms
forward 217±50ms 207±60ms
nearest 287±60ms 290±50ms
=========== ========== ==========

[ 66.67%] ··· join_merge.MergeAsof.time_by_object ok
[ 66.67%] ··· =========== ========== ==========
-- tolerance
----------- ---------------------
direction None 5000
=========== ========== ==========
backward 124±20ms 142±30ms
forward 209±20ms 203±10ms
nearest 274±10ms 276±10ms
=========== ========== ==========

[ 75.00%] ··· join_merge.MergeAsof.time_multiby ok
[ 75.00%] ··· =========== ========== ==========
-- tolerance
----------- ---------------------
direction None 5000
=========== ========== ==========
backward 667±20ms 687±40ms
forward 799±40ms 775±30ms
nearest 836±10ms 845±30ms
=========== ========== ==========

[ 83.33%] ··· join_merge.MergeAsof.time_on_int ok
[ 83.33%] ··· =========== ========== ==========
-- tolerance
----------- ---------------------
direction None 5000
=========== ========== ==========
backward 32.3±5ms 31.1±2ms
forward 30.9±3ms 30.6±2ms
nearest 38.7±2ms 40.0±3ms
=========== ========== ==========

[ 91.67%] ··· join_merge.MergeAsof.time_on_int32 ok
[ 91.67%] ··· =========== ========== ==========
-- tolerance
----------- ---------------------
direction None 5000
=========== ========== ==========
backward 30.8±4ms 31.7±5ms
forward 45.6±8ms 40.2±5ms
nearest 43.4±2ms 47.6±7ms
=========== ========== ==========

[100.00%] ··· join_merge.MergeAsof.time_on_uint64 ok
[100.00%] ··· =========== ========== ==========
-- tolerance
----------- ---------------------
direction None 5000
=========== ========== ==========
backward 36.8±5ms 43.5±5ms
forward 34.7±5ms 40.1±5ms
nearest 39.3±2ms 43.9±7ms
=========== ========== ==========

self.df1f = df1[["timeu64", "value1"]]
self.df2f = df2[["timeu64", "value2"]]

def time_on_int(self, direction, tolerance):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of separate functions like this can you also just parametrze on dtype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want all of the functions to implement the dtype, or only do this for the time_on functions?

The option to leave the dtype out for the others would be for example to negate it using the conventional _ in the list of arguments, but I think this wouldn't be the cleanest of cases?

Suggested change
def time_on_int(self, direction, tolerance):
def time_on(self, dtype, direction, tolerance):
self.df1a["time"] = self.df1a.time.astype(dtype)
self.df2a["time"] = self.df2a.time.astype(dtype)
...
def time_by_object(self, _, direction, tolerance):
...
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK I see your point. I think OK to leave as is then

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @jreback

self.df1f = df1[["timeu64", "value1"]]
self.df2f = df2[["timeu64", "value2"]]

def time_on_int(self, direction, tolerance):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK I see your point. I think OK to leave as is then

@WillAyd WillAyd added this to the 1.0 milestone Oct 14, 2019
@WillAyd WillAyd added the Benchmark Performance (ASV) benchmarks label Oct 14, 2019
@WillAyd WillAyd merged commit 68632fb into pandas-dev:master Oct 22, 2019
@WillAyd
Copy link
Member

WillAyd commented Oct 22, 2019

Thanks @jjlkant !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: asv's for non-int64 merges
2 participants