-
Notifications
You must be signed in to change notification settings - Fork 63
Optimize benchmarks; permit serialization #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lavalamp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Antoine is OOO. I think this should be ready to go. I haven't tried this on the main repo but this is definitely not worse than what we have now unless the benchmarks are completely unrepresentative. /assign @jennybuckley @kwiesmueller |
Benchmarks in this repo use significantly smaller sets than real-life sets. I doubt this is going to be equally good with real-life size objects :-/ |
Benchmark comparison:
|
I'll test in the main repo, then. :'( |
Wouldn't it be better to improve these tests? Not sure what is the best way to make them more real-life size. |
The benchmark code here is running a bunch of test assertions, diluting the signal here. After making it not do that, the results are not changed much, maybe slightly worse. Time seems pretty noisy, but I'm looking at allocations. I started to make the items in the benchmarks bigger but that is a harder task than expected, plus I seem to have hit a bug where field names are not handled right, e.g. letter followed by number has the number stripped.
|
I've added a microbenchmark for set.
As you can see, it's a massive improvement with a few exceptions, mostly intersection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised with the very high variance (15-25%) across consecutive runs. I looked carefully and I don't see any obvious reasons for it. I'll investigate a little bit more. (I'm a little surprised by some of your old/new benchmark being sometimes better and sometimes worse in a non-linear way).
@@ -598,7 +598,13 @@ func BenchmarkDeducedSimple(b *testing.B) { | |||
}, | |||
} | |||
|
|||
// Make sure this passes... | |||
if err := test.Test(typed.DeducedParseableType); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You kept the same test below in the loop, is that on purpose? (same for other benches below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah that explains why the numbers didn't change much, haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
benchmark old ns/op new ns/op delta
BenchmarkDeducedSimple-12 422922 415645 -1.72%
BenchmarkDeducedNested-12 1252984 1179542 -5.86%
BenchmarkDeducedNestedAcrossVersion-12 1395454 1293546 -7.30%
BenchmarkLeafConflictAcrossVersion-12 412367 410950 -0.34%
BenchmarkMultipleApplierRecursiveRealConversion-12 2481784 2413984 -2.73%
benchmark old allocs new allocs delta
BenchmarkDeducedSimple-12 1634 1533 -6.18%
BenchmarkDeducedNested-12 4757 4289 -9.84%
BenchmarkDeducedNestedAcrossVersion-12 5138 4633 -9.83%
BenchmarkLeafConflictAcrossVersion-12 1672 1613 -3.53%
BenchmarkMultipleApplierRecursiveRealConversion-12 8154 7696 -5.62%
benchmark old bytes new bytes delta
BenchmarkDeducedSimple-12 126676 114318 -9.76%
BenchmarkDeducedNested-12 368885 323195 -12.39%
BenchmarkDeducedNestedAcrossVersion-12 401554 350212 -12.79%
BenchmarkLeafConflictAcrossVersion-12 134616 121344 -9.86%
BenchmarkMultipleApplierRecursiveRealConversion-12 784880 711578 -9.34%
After fixing these (not pushed yet).
randOperand := func() *Set { return operands[rand.Intn(len(operands))] } | ||
b.Run(fmt.Sprintf("union-%v", here.size), func(b *testing.B) { | ||
b.ResetTimer() | ||
b.ReportAllocs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if these two are needed
I just had this in two consecutive runs:
|
I tweaked the benchmarks a bit. I did three runs of old and new, which were all very consistent with each other. Here's a random old / new comparison:
Even though intersection does more allocations, it's still somehow faster. |
I made another change to permit it to cache the order for PathElement.Keys. It was apparently a huge deal.
|
I also looked at a mem & cpu profile while running the benchmarks and I don't think there's anything left that's easy to improve now. The latest change improved the original benchmarks, too:
I'd like to squash and merge this, can someone review it first? |
… on all benchmarks; needs more testing of the lexical operator
This is technically a breaking change, but most people won't construct this directly so it should be low-impact.
I squashed this into some sensible commits. @jennybuckley can you LGTM? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
if s.members[loc].Equals(pe) { | ||
return | ||
} | ||
n := len(s.members) - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly similar but maybe better: https://github.com/golang/go/wiki/SliceTricks#insert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I meant to switch this to the thing they have listed as the "copy" variant; the other variant does an unnecessary allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final benchmarks are a little underwhelming, compared to the improvements to set operations. |
I think the string / yaml conversion is probably dominating the performance in those benchmarks. |
I managed to run this in the main repo, will send an update PR there I guess if I can figure out the dependency stuff, but here's the results:
|
Prior to these changes:
After these changes:
Mostly I was looking at allocations but this is a modest improvement for everything.
This will permit a straightforward serialization. The current map with string key form can't be deserialized in a single step.