-
Notifications
You must be signed in to change notification settings - Fork 63
Use PathElementSet and similar to index PathElement #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use PathElementSet and similar to index PathElement #114
Conversation
More or less replaces #112 |
keyStr := pe.String() | ||
if _, found := observedKeys[keyStr]; found { | ||
errs = append(errs, v.errorf("duplicate entries for key %v", keyStr)...) | ||
if observedKeys.Has(pe) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really faster? i know the benchmarks are better but it makes the constant time operation with something that has to sort a slice and do a binary search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @lavalamp
It is slightly faster but most importantly it's more correct. Using the string as the index is a little wrong since it's not 100% canonicalized and is not meant to be used for "production use".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocating a buffer for the string, converting the pathelement to a string, hashing the string, comparing the strings, is quite expensive too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String() was actually written to make this work, so this should be equally correct.
The set impl is O(log n) but with a low enough constant factor that it was faster than the O(1) thing it replaced. Now we don't have a O(1) thing to compare it with, I assume at some size it would stop being faster.
typed/validate.go
Outdated
@@ -142,7 +142,7 @@ func (v *validatingObjectWalker) doScalar(t schema.Scalar) ValidationErrors { | |||
} | |||
|
|||
func (v *validatingObjectWalker) visitListItems(t schema.List, list *value.List) (errs ValidationErrors) { | |||
observedKeys := map[string]struct{}{} | |||
observedKeys := fieldpath.PathElementSet{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-allocating the array to be the right size might save a few allocs. (followup)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearly, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally remembered to do this one, another 20% improvement ;-)
@@ -168,7 +219,7 @@ func (w *mergingWalker) visitListItems(t schema.List, lhs, rhs *value.List) (err | |||
rhsOrder := []fieldpath.PathElement{} | |||
|
|||
// First, collect all RHS children. | |||
observedRHS := map[string]value.Value{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised I missed these uses of strings when I was working on this last quarter...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We had much bigger things to improve,
- Maybe because of the test data?!
Anyways, /shrug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is a fairly small improvement compared to some of the things that are coming.
typed/merge.go
Outdated
return nil | ||
} | ||
if s.members[loc].PathElement.Equals(pe) { | ||
return &s.members[loc].Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very risky, a subsequent insert will make the pointer point to the wrong thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we could make it return a (copy, bool) to be safer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that'd be better.
typed/merge.go
Outdated
|
||
type pathElementValue struct { | ||
PathElement fieldpath.PathElement | ||
Value value.Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to say, it might be worth testing whether *value.Value is better for perf (less data to copy around), but below I found an even better reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is to never move objects in the slice but instead keep a parallel []int containing the sorted order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not over-optimize. Values are going to change significantly anyway.
typed/merge.go
Outdated
"sigs.k8s.io/structured-merge-diff/fieldpath" | ||
"sigs.k8s.io/structured-merge-diff/schema" | ||
"sigs.k8s.io/structured-merge-diff/value" | ||
) | ||
|
||
type pathElementValueMap struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I wrote the set version of this, getting the sort call and less function to do the right thing was tricky. I think unit tests of this particular thing are in order to keep it working :/
Alternatively consider moving this into the same package as the set type where it'd be easier to share code and tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we're doing very similar things everywhere, but I'd also like to avoid having PathElementMap
to interface{}
(which we could use in all three places)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I think we need either tests or shared implementation. I don't think interface{} is going to make it slow if that's what you're worried about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I suspect it has to escape to heap when you cast to interface{}
, but I might be wrong, it's worth trying.
da9d832
to
dae355f
Compare
dae355f
to
6520054
Compare
I think we should probably re-use the code in other places rather than add another implementation of the set, but I don't have a lot of time for that right now so I've decided to add a test instead. |
35ee26f
to
9acd93a
Compare
9acd93a
to
5d9b53d
Compare
OK, I've added the pre-allocation of the right size for the arrays, here are the new numbers, pretty good:
|
/lgtm Sorry, didn't realize this was waiting on me. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: apelisse, lavalamp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Currently, we index PathElement by turning them into strings, which is neither fast nor accurate.
We can simply use the
PathElementSet
when we need a set, and we can build a map of PathElement to Value when needed using very similar code.Improvement: