Use PathElementSet and similar to index PathElement #114

apelisse · 2019-10-03T19:56:41Z

Currently, we index PathElement by turning them into strings, which is neither fast nor accurate.
We can simply use the PathElementSet when we need a set, and we can build a map of PathElement to Value when needed using very similar code.

Improvement:

benchmark                                              old ns/op     new ns/op     delta
BenchmarkOperations/Pod/Update-12                      187994        162602        -13.51%
BenchmarkOperations/Pod/Apply-12                       525375        452963        -13.78%
BenchmarkOperations/Node/Update-12                     273261        220959        -19.14%
BenchmarkOperations/Node/Apply-12                      760287        667991        -12.14%
BenchmarkOperations/Endpoints/Update-12                32649         32794         +0.44%
BenchmarkOperations/Endpoints/Apply-12                 2072047       1862396       -10.12%
BenchmarkFromUnstructured/Pod-12                       202893        182379        -10.11%
BenchmarkFromUnstructured/Node-12                      331193        308072        -6.98%
BenchmarkFromUnstructured/Endpoints-12                 8057365       7754632       -3.76%

benchmark                                              old allocs     new allocs     delta
BenchmarkOperations/Pod/Update-12                      445            284            -36.18%
BenchmarkOperations/Pod/Apply-12                       1179           771            -34.61%
BenchmarkOperations/Node/Update-12                     601            336            -44.09%
BenchmarkOperations/Node/Apply-12                      1876           1138           -39.34%
BenchmarkOperations/Endpoints/Update-12                74             74             +0.00%
BenchmarkOperations/Endpoints/Apply-12                 3142           1120           -64.35%
BenchmarkFromUnstructured/Pod-12                       477            392            -17.82%
BenchmarkFromUnstructured/Node-12                      1107           900            -18.70%
BenchmarkFromUnstructured/Endpoints-12                 21115          19089          -9.60%

apelisse · 2019-10-03T19:57:34Z

More or less replaces #112

jennybuckley · 2019-10-03T22:56:16Z

typed/validate.go

-		keyStr := pe.String()
-		if _, found := observedKeys[keyStr]; found {
-			errs = append(errs, v.errorf("duplicate entries for key %v", keyStr)...)
+		if observedKeys.Has(pe) {


Is this really faster? i know the benchmarks are better but it makes the constant time operation with something that has to sort a slice and do a binary search

https://github.com/kubernetes-sigs/structured-merge-diff/blob/master/fieldpath/element.go#L243-L255

cc @lavalamp

It is slightly faster but most importantly it's more correct. Using the string as the index is a little wrong since it's not 100% canonicalized and is not meant to be used for "production use".

Allocating a buffer for the string, converting the pathelement to a string, hashing the string, comparing the strings, is quite expensive too

String() was actually written to make this work, so this should be equally correct.

The set impl is O(log n) but with a low enough constant factor that it was faster than the O(1) thing it replaced. Now we don't have a O(1) thing to compare it with, I assume at some size it would stop being faster.

lavalamp · 2019-10-04T17:56:58Z

typed/validate.go

@@ -142,7 +142,7 @@ func (v *validatingObjectWalker) doScalar(t schema.Scalar) ValidationErrors {
 }

 func (v *validatingObjectWalker) visitListItems(t schema.List, list *value.List) (errs ValidationErrors) {
-	observedKeys := map[string]struct{}{}
+	observedKeys := fieldpath.PathElementSet{}


Pre-allocating the array to be the right size might save a few allocs. (followup)

Clearly, thanks!

I finally remembered to do this one, another 20% improvement ;-)

lavalamp · 2019-10-04T17:57:47Z

typed/merge.go

@@ -168,7 +219,7 @@ func (w *mergingWalker) visitListItems(t schema.List, lhs, rhs *value.List) (err
 	rhsOrder := []fieldpath.PathElement{}

 	// First, collect all RHS children.
-	observedRHS := map[string]value.Value{}


I'm surprised I missed these uses of strings when I was working on this last quarter...

We had much bigger things to improve,

Maybe because of the test data?!

Anyways, /shrug

This PR is a fairly small improvement compared to some of the things that are coming.

lavalamp · 2019-10-04T18:00:34Z

typed/merge.go

+		return nil
+	}
+	if s.members[loc].PathElement.Equals(pe) {
+		return &s.members[loc].Value


This is very risky, a subsequent insert will make the pointer point to the wrong thing.

yeah, we could make it return a (copy, bool) to be safer?

Yes, that'd be better.

lavalamp · 2019-10-04T18:01:50Z

typed/merge.go

+
+type pathElementValue struct {
+	PathElement fieldpath.PathElement
+	Value       value.Value


I was going to say, it might be worth testing whether *value.Value is better for perf (less data to copy around), but below I found an even better reason.

An alternative is to never move objects in the slice but instead keep a parallel []int containing the sorted order.

Let's not over-optimize. Values are going to change significantly anyway.

lavalamp · 2019-10-04T18:06:23Z

typed/merge.go

 	"sigs.k8s.io/structured-merge-diff/fieldpath"
 	"sigs.k8s.io/structured-merge-diff/schema"
 	"sigs.k8s.io/structured-merge-diff/value"
 )

+type pathElementValueMap struct {


When I wrote the set version of this, getting the sort call and less function to do the right thing was tricky. I think unit tests of this particular thing are in order to keep it working :/

Alternatively consider moving this into the same package as the set type where it'd be easier to share code and tests?

Yeah, we're doing very similar things everywhere, but I'd also like to avoid having PathElementMap to interface{} (which we could use in all three places)...

Why? I think we need either tests or shared implementation. I don't think interface{} is going to make it slow if that's what you're worried about?

Yeah, I suspect it has to escape to heap when you cast to interface{}, but I might be wrong, it's worth trying.

apelisse · 2019-10-07T23:15:06Z

I think we should probably re-use the code in other places rather than add another implementation of the set, but I don't have a lot of time for that right now so I've decided to add a test instead.
I can create an issue to track that if needed. And I'll add a comment.

apelisse · 2019-10-08T04:23:46Z

OK, I've added the pre-allocation of the right size for the arrays, here are the new numbers, pretty good:

benchmark                                              old ns/op     new ns/op     delta
BenchmarkDeducedSimple-12                              113151        114199        +0.93%
BenchmarkDeducedNested-12                              319775        296626        -7.24%
BenchmarkDeducedNestedAcrossVersion-12                 329968        328015        -0.59%
BenchmarkLeafConflictAcrossVersion-12                  114734        107390        -6.40%
BenchmarkMultipleApplierRecursiveRealConversion-12     1565560       1588022       +1.43%
BenchmarkOperations/Pod/Update-12                      187994        115494        -38.57%
BenchmarkOperations/Pod/Apply-12                       525375        294227        -44.00%
BenchmarkOperations/Node/Update-12                     273261        172191        -36.99%
BenchmarkOperations/Node/Apply-12                      760287        502819        -33.86%
BenchmarkOperations/Endpoints/Update-12                32649         22770         -30.26%
BenchmarkOperations/Endpoints/Apply-12                 2072047       1003676       -51.56%
BenchmarkFromUnstructured/Pod-12                       202893        115718        -42.97%
BenchmarkFromUnstructured/Node-12                      331193        234963        -29.06%
BenchmarkFromUnstructured/Endpoints-12                 8057365       6069791       -24.67%

benchmark                                              old allocs     new allocs     delta
BenchmarkDeducedSimple-12                              552            552            +0.00%
BenchmarkDeducedNested-12                              1534           1498           -2.35%
BenchmarkDeducedNestedAcrossVersion-12                 1602           1566           -2.25%
BenchmarkLeafConflictAcrossVersion-12                  532            532            +0.00%
BenchmarkMultipleApplierRecursiveRealConversion-12     5229           5232           +0.06%
BenchmarkOperations/Pod/Update-12                      445            282            -36.63%
BenchmarkOperations/Pod/Apply-12                       1179           744            -36.90%
BenchmarkOperations/Node/Update-12                     601            342            -43.09%
BenchmarkOperations/Node/Apply-12                      1876           1103           -41.20%
BenchmarkOperations/Endpoints/Update-12                74             72             -2.70%
BenchmarkOperations/Endpoints/Apply-12                 3142           1103           -64.89%
BenchmarkFromUnstructured/Pod-12                       477            370            -22.43%
BenchmarkFromUnstructured/Node-12                      1107           854            -22.85%
BenchmarkFromUnstructured/Endpoints-12                 21115          19076          -9.66%

lavalamp · 2019-10-22T23:04:53Z

/lgtm
/approve

Sorry, didn't realize this was waiting on me.

k8s-ci-robot · 2019-10-22T23:05:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: apelisse, lavalamp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [apelisse,lavalamp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 3, 2019

k8s-ci-robot requested review from jennybuckley and lavalamp October 3, 2019 19:56

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 3, 2019

jennybuckley reviewed Oct 3, 2019

View reviewed changes

lavalamp reviewed Oct 4, 2019

View reviewed changes

apelisse force-pushed the perf/pathelement-set branch from da9d832 to dae355f Compare October 7, 2019 23:12

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2019

apelisse force-pushed the perf/pathelement-set branch from dae355f to 6520054 Compare October 7, 2019 23:13

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2019

apelisse force-pushed the perf/pathelement-set branch 2 times, most recently from 35ee26f to 9acd93a Compare October 7, 2019 23:20

Antoine Pelisse added 2 commits October 7, 2019 21:15

Use PathElementSet rather than map[PathElement.String()]

c0f1a64

Create map of PathElement to Value and use it instead of Strings

5d9b53d

apelisse force-pushed the perf/pathelement-set branch from 9acd93a to 5d9b53d Compare October 8, 2019 04:21

apelisse mentioned this pull request Oct 9, 2019

Change "value" package altogether to avoid copying everything #117

Closed

k8s-ci-robot assigned lavalamp Oct 22, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 22, 2019

k8s-ci-robot merged commit e660f95 into kubernetes-sigs:master Oct 22, 2019

Use PathElementSet and similar to index PathElement #114

Use PathElementSet and similar to index PathElement #114

Uh oh!

Conversation

apelisse commented Oct 3, 2019

Uh oh!

apelisse commented Oct 3, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apelisse Oct 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apelisse commented Oct 7, 2019

Uh oh!

apelisse commented Oct 8, 2019

Uh oh!

lavalamp commented Oct 22, 2019

Uh oh!

k8s-ci-robot commented Oct 22, 2019

Uh oh!

Uh oh!

apelisse Oct 4, 2019 •

edited

Loading