-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Benchmark merge with non-int64 and tolerance (#28922) #28974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently implements a parameterized tolerance benchmark using Results of these benchmarks: · Discovering benchmarks [ 66.67%] ··· join_merge.MergeAsof.time_by_object ok [ 75.00%] ··· join_merge.MergeAsof.time_multiby ok [ 83.33%] ··· join_merge.MergeAsof.time_on_int ok [ 91.67%] ··· join_merge.MergeAsof.time_on_int32 ok [100.00%] ··· join_merge.MergeAsof.time_on_uint64 ok
· Discovering benchmarks [ 66.67%] ··· join_merge.MergeAsof.time_by_object ok [ 75.00%] ··· join_merge.MergeAsof.time_multiby ok [ 83.33%] ··· join_merge.MergeAsof.time_on_int ok [ 91.67%] ··· join_merge.MergeAsof.time_on_int32 ok [100.00%] ··· join_merge.MergeAsof.time_on_uint64 ok
· Discovering benchmarks [ 66.67%] ··· join_merge.MergeAsof.time_by_object ok [ 75.00%] ··· join_merge.MergeAsof.time_multiby ok [ 83.33%] ··· join_merge.MergeAsof.time_on_int ok [ 91.67%] ··· join_merge.MergeAsof.time_on_int32 ok [100.00%] ··· join_merge.MergeAsof.time_on_uint64 ok |
self.df1f = df1[["timeu64", "value1"]] | ||
self.df2f = df2[["timeu64", "value2"]] | ||
|
||
def time_on_int(self, direction, tolerance): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of separate functions like this can you also just parametrze on dtype
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want all of the functions to implement the dtype, or only do this for the time_on
functions?
The option to leave the dtype out for the others would be for example to negate it using the conventional _
in the list of arguments, but I think this wouldn't be the cleanest of cases?
def time_on_int(self, direction, tolerance): | |
def time_on(self, dtype, direction, tolerance): | |
self.df1a["time"] = self.df1a.time.astype(dtype) | |
self.df2a["time"] = self.df2a.time.astype(dtype) | |
... | |
def time_by_object(self, _, direction, tolerance): | |
... | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm OK I see your point. I think OK to leave as is then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @jreback
self.df1f = df1[["timeu64", "value1"]] | ||
self.df2f = df2[["timeu64", "value2"]] | ||
|
||
def time_on_int(self, direction, tolerance): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm OK I see your point. I think OK to leave as is then
Thanks @jjlkant ! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff