Skip to content

Merge would fail if an rhs partially specified object omits key fields despite they can be derived from paths #273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andrii-korotkov-verkada opened this issue Nov 20, 2024 · 27 comments
Assignees

Comments

@andrii-korotkov-verkada
Copy link

andrii-korotkov-verkada commented Nov 20, 2024

Hello, I hope your day is going well.
I'm encountering the errors like

associative list with keys has an element that omits key field "name" (and doesn't have default value)

The code in question is here.
Here are examples of manifests used:
Here's an example of a part of rhs (derived from Compare result, Removed field):

.spec.template.spec.containers[name="nginx"].terminationMessagePath
.spec.template.spec.containers[name="nginx"].terminationMessagePolicy
.spec.template.spec.containers[name="nginx"].livenessProbe.failureThreshold
.spec.template.spec.containers[name="nginx"].livenessProbe.successThreshold
.spec.template.spec.containers[name="nginx"].livenessProbe.timeoutSeconds
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"]
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"].containerPort
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"].protocol

So it doesn't have

.spec.template.spec.containers[name="nginx"].name

since it's not getting removed.
But trying to merge this kind of rhs seems to depend on having that field explicitly.
If I do post-processing using the code

// appendKeyFields appends the key fields like "name" to the set elements which have some entries keyed by those fields.
// This is needed to merge properly, see https://github.com/argoproj/argo-cd/issues/20792.
func appendKeyFields(set *fieldpath.Set) {
	keyFieldPaths := []fieldpath.Path{}
	set.Iterate(func(path fieldpath.Path) {
		for i := 0; i < len(path); i++ {
			if path[i].Key != nil {
				for _, keyField := range *path[i].Key {
					pathPartCopy := make([]fieldpath.PathElement, i+1)
					copy(pathPartCopy, path[:i+1])
					newPath := append(pathPartCopy, fieldpath.PathElement{FieldName: &keyField.Name})
					keyFieldPaths = append(keyFieldPaths, newPath)
				}
			}
		}
	})
	for _, path := range keyFieldPaths {
		if !set.Has(path) {
			set.Insert(path)
		}
	}
}

it works properly.
However, the values of key fields should already be available as a part of paths, so in theory Merge should just use those and don't complain about missing keys. Maybe it can already do that, just validation step complains.
Here is how we use the library. Here is the relevant ArgoCD issue.
Please, let me know what's the best course of action here and if this is something that can be addressed in Structured Merge Diff. Thank you!

Addendum: full manifests used:

Target:

``` apiVersion: apps/v1 kind: Deployment metadata: labels: app: missing applications.argoproj.io/app-name: nginx something-else: bla name: nginx-deployment namespace: default spec: replicas: 2 selector: matchLabels: app: nginx template: metadata: labels: app: nginx applications.argoproj.io/app-name: nginx spec: containers: - image: 'nginx:1.23.1' imagePullPolicy: Never livenessProbe: exec: command: - cat - non-existent-file initialDelaySeconds: 5 periodSeconds: 180 name: nginx ports: - containerPort: 8081 protocol: UDP - containerPort: 80 protocol: TCP ```

Live:

``` apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: '1' creationTimestamp: '2022-09-18T23:50:25Z' generation: 1 labels: app: missing applications.argoproj.io/app-name: nginx something-else: bla managedFields: - apiVersion: apps/v1 fieldsType: FieldsV1 fieldsV1: 'f:metadata': 'f:labels': 'f:app': {} 'f:applications.argoproj.io/app-name': {} 'f:something-else': {} 'f:spec': 'f:replicas': {} 'f:selector': {} 'f:template': 'f:metadata': 'f:labels': 'f:app': {} 'f:applications.argoproj.io/app-name': {} 'f:spec': 'f:containers': 'k:{"name":"nginx"}': .: {} 'f:image': {} 'f:imagePullPolicy': {} 'f:livenessProbe': 'f:exec': 'f:command': {} 'f:initialDelaySeconds': {} 'f:periodSeconds': {} 'f:name': {} 'f:ports': 'k:{"containerPort":80,"protocol":"TCP"}': .: {} 'f:containerPort': {} 'f:protocol': {} 'f:resources': 'f:requests': 'f:cpu': {} 'f:memory': {} manager: argocd-controller operation: Apply time: '2022-09-18T23:50:25Z' - apiVersion: apps/v1 fieldsType: FieldsV1 fieldsV1: 'f:metadata': 'f:annotations': .: {} 'f:deployment.kubernetes.io/revision': {} 'f:status': 'f:availableReplicas': {} 'f:conditions': .: {} 'k:{"type":"Available"}': .: {} 'f:lastTransitionTime': {} 'f:lastUpdateTime': {} 'f:message': {} 'f:reason': {} 'f:status': {} 'f:type': {} 'k:{"type":"Progressing"}': .: {} 'f:lastTransitionTime': {} 'f:lastUpdateTime': {} 'f:message': {} 'f:reason': {} 'f:status': {} 'f:type': {} 'f:observedGeneration': {} 'f:readyReplicas': {} 'f:replicas': {} 'f:updatedReplicas': {} manager: kube-controller-manager operation: Update subresource: status time: '2022-09-23T18:30:59Z' name: nginx-deployment namespace: default resourceVersion: '7492752' uid: 731f7434-d3d9-47fa-b179-d9368a84f7c9 spec: progressDeadlineSeconds: 600 replicas: 2 revisionHistoryLimit: 10 selector: matchLabels: app: nginx strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: app: nginx applications.argoproj.io/app-name: nginx spec: containers: - image: 'nginx:1.23.1' imagePullPolicy: Never livenessProbe: exec: command: - cat - non-existent-file failureThreshold: 3 initialDelaySeconds: 5 periodSeconds: 180 successThreshold: 1 timeoutSeconds: 1 name: nginx ports: - containerPort: 80 protocol: TCP - containerPort: 8080 protocol: TCP - containerPort: 8081 protocol: UDP resources: requests: memory: 512Mi cpu: 500m terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 status: availableReplicas: 2 conditions: - lastTransitionTime: '2022-09-18T23:50:25Z' lastUpdateTime: '2022-09-18T23:50:26Z' message: ReplicaSet "nginx-deployment-6d68ff5f86" has successfully progressed. reason: NewReplicaSetAvailable status: 'True' type: Progressing - lastTransitionTime: '2022-09-23T18:30:59Z' lastUpdateTime: '2022-09-23T18:30:59Z' message: Deployment has minimum availability. reason: MinimumReplicasAvailable status: 'True' type: Available observedGeneration: 1 readyReplicas: 2 replicas: 2 updatedReplicas: 2 ```

Predicted live:

``` { "apiVersion": "apps/v1", "kind": "Deployment", "metadata": { "labels": { "app": "missing", "applications.argoproj.io/app-name": "nginx", "something-else": "bla" }, "name": "nginx-deployment", "namespace": "default", "managedFields": [ { "apiVersion": "apps/v1", "fieldsType": "FieldsV1", "fieldsV1": { "f:metadata": { "f:labels": { "f:app": {}, "f:applications.argoproj.io/app-name": {}, "f:something-else": {} } }, "f:spec": { "f:replicas": {}, "f:selector": {}, "f:template": { "f:metadata": { "f:labels": { "f:app": {}, "f:applications.argoproj.io/app-name": {} } }, "f:spec": { "f:containers": { "k:{\"name\":\"nginx\"}": { ".": {}, "f:image": {}, "f:imagePullPolicy": {}, "f:livenessProbe": { "f:exec": { "f:command": {} }, "f:initialDelaySeconds": {}, "f:periodSeconds": {} }, "f:name": {}, "f:ports": { "k:{\"containerPort\":80,\"protocol\":\"TCP\"}": { ".": {}, "f:containerPort": {}, "f:protocol": {} } }, "f:resources": { "f:requests": { "f:cpu": {}, "f:memory": {} } } } } } } } }, "manager": "argocd-controller", "operation": "Apply", "time": "2022-09-18T23:50:25Z" } ] }, "spec": { "replicas": 2, "selector": { "matchLabels": { "app": "nginx" } }, "template": { "metadata": { "labels": { "app": "nginx", "applications.argoproj.io/app-name": "nginx" } }, "spec": { "containers": [ { "image": "nginx:1.23.1", "imagePullPolicy": "Never", "livenessProbe": { "exec": { "command": [ "cat", "non-existent-file" ] }, "initialDelaySeconds": 5, "periodSeconds": 180 }, "name": "nginx", "ports": [ { "containerPort": 8081, "protocol": "UDP" }, { "containerPort": 80, "protocol": "TCP" } ], "resources": { "requests": { "memory": "512Mi", "cpu": "500m" } } } ] } } } } ```
@apelisse
Copy link
Contributor

Can you describe the steps that lead to that situation? It's a bit unclear how you get that.

@andrii-korotkov-verkada
Copy link
Author

andrii-korotkov-verkada commented Nov 23, 2024

Can you describe the steps that lead to that situation? It's a bit unclear how you get that.

Call typedLive.Compare(typedPredictedLive). The result's Removed field won't contain key fields if they aren't removed. The example manifests can be found collapsed in the issue description at the bottom. Now, call liveRmValues := typedLive.ExtractItems(comparison.Removed) and typedPredictedLive, err = typedPredictedLive.Merge(liveRmValues). Since liveRmValues doesn't contain some key fields with which don't have a default value, there's an error merging.

Note, that it's not enough to just add key fields to the Removed, since then the code like typedPredictedLive = typedPredictedLive.RemoveItems(comparison.Removed) would become incorrect, since it'd remove extra key fields which weren't actually removed.

If I understand correctly, the solution would be to update Merge's validation logic to not require key fields paths to be present if they can be derived from other paths.

I.e. if we have .spec.template.spec.containers[name="nginx"].terminationMessagePath, we shouldn't require having .spec.template.spec.containers[name="nginx"].name path to be present to merge .spec.template.spec.containers[name="nginx"].

@liggitt
Copy link
Contributor

liggitt commented Nov 23, 2024

Is this actually a reachable scenario using server-side apply? I don't really think we should open the door to allowing the key field to be absent and non-defaulted from a RHS

@andrii-korotkov-verkada
Copy link
Author

andrii-korotkov-verkada commented Nov 23, 2024

It's happening with server-side diff, see the linked ArgoCD issue (argoproj/argo-cd#20792).
Do you suggest me to go with a workaround I've done and just populate key fields to removed paths before calling merge? Can there be another merge function added or parameter passed to infer the key fields for this kind of scenario?
(Btw, I've updated the append logic to take lhs and rhs and only insert paths for key fields if they are present in lhs).

@andrii-korotkov-verkada
Copy link
Author

@jpbetz, do you have any thoughts on this?

@jpbetz
Copy link
Contributor

jpbetz commented Dec 2, 2024

the values of key fields should already be available as a part of paths

This is the part where I get lost. A associative list in an apply configuration is just a list of objects where the logical map keys are defined to be fields of that list entries. Without the name field, it's not clear to me how a comparison would know how to correlate rhs entries with lhs entries.

Is this maybe a problem with error messaging, or is there a reason that a name field is not being provided in this situation that I don't yet understand?

@andrii-korotkov-verkada
Copy link
Author

The problem is when we compute the comparison and get Removed part, paths like .spec.template.spec.containers[name="nginx"].name may be not present, since they aren't necessarily removed. However, if we then use Removed as rhs, it seems to require .spec.template.spec.containers[name="nginx"].name to be present explicitly, i.e. can't derive the value of name from .spec.template.spec.containers[name="nginx"].terminationMessagePath.

@jpbetz
Copy link
Contributor

jpbetz commented Dec 2, 2024

Ah, so the issue is specifically taking result of Compare(...).Removed and then passing it as an input to subsequent operations?

It seems to me that .spec.template.spec.containers[name="nginx"].name should be present in the Removed part of the comparison result. I'm trying to think of some reason this might be problematic, but I can't think of anything. I'd be receptive to a fix that adds key fields to the output of compare. Would this solve the problem?

cc @yongruilin

@andrii-korotkov-verkada
Copy link
Author

It's tricky. If Removed was used to remove some fields from lhs, we won't want the .spec.template.spec.containers[name="nginx"].name to be present if it's not actually intended to be removed.

@andrii-korotkov-verkada
Copy link
Author

We may need two different versions of Merge or Merge to infer the key fields from other paths.

@jpbetz
Copy link
Contributor

jpbetz commented Dec 2, 2024

It's tricky. If Removed was used to remove some fields from lhs, we won't want the .spec.template.spec.containers[name="nginx"].name to be present if it's not actually intended to be removed.

Right. We have multiple semantics we need to support. @andrii-korotkov-verkada would you be able to define a set of unit tests that cover the expected cases? (we don't need them to pass, we just need to declare our success criteria as clearly as possible). I suspect we can fix this either by changing when/how we include/exclude key fields, or by supplementing Merge (either with options or some other information)..

@andrii-korotkov-verkada
Copy link
Author

Sure, I'll post some test cases.

@andrii-korotkov-verkada
Copy link
Author

andrii-korotkov-verkada commented Dec 2, 2024

I'll show paths for lhs, rhs and merge result.

Test case 1:

lhs:
.spec.template.spec.containers[name="nginx"].name

rhs:
.spec.template.spec.containers[name="nginx"].terminationMessagePath

result:
.spec.template.spec.containers[name="nginx"].name
.spec.template.spec.containers[name="nginx"].terminationMessagePath

Test case 2:

lhs:
.spec.template.spec.containers[name="nginx"].terminationMessagePolicy

rhs:
.spec.template.spec.containers[name="nginx"].terminationMessagePath

Result:
.spec.template.spec.containers[name="nginx"].terminationMessagePolicy
.spec.template.spec.containers[name="nginx"].terminationMessagePath

Test case 3:

lhs:
.spec.template.spec.containers[name="nginx"].terminationMessagePath

rhs:
.spec.template.spec.containers[name="nginx"].name

result:
.spec.template.spec.containers[name="nginx"].name
.spec.template.spec.containers[name="nginx"].terminationMessagePath

Test case 4:

lhs:
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"]

rhs:
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"].containerPort

Result:
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"]
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"].containerPort

Test case 5:

lhs:
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"].containerPort

rhs:
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"].protocol

Result:
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"].containerPort
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"].protocol

Test case 6:

lhs:
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"]
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"].containerPort

rhs:
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"]
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"].protocol

Result:
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"]
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"].containerPort
.spec.template.spec.containers[name="nginx"].ports[containerPort=8080,protocol="TCP"].protocol

Test case 7:

lhs:
.spec.template.spec.containers[name="nginx"].name

rhs:
.spec.template.spec.containers[name="nginx"].name
.spec.template.spec.containers[name="nginx"].terminationMessagePath

Result:
.spec.template.spec.containers[name="nginx"].name
.spec.template.spec.containers[name="nginx"].terminationMessagePath

@leoluz
Copy link

leoluz commented Dec 3, 2024

Just providing a bit more context that may help on taking the decision about the fix. We use the structured-merge-diff library inside Argo CD codebase to support with some of our features. One of the features is what we call ServerSideDiff. Basically we run SSA in dryrun mode and compare the result (predicted live state aka pls) with the desired state to generate the diff that we show to users. The problem is that the pls will come with changes made by mutation webhooks. From Argo CD user's perspective, changes made by mutation webhooks aren't desired. In this case we use the structured-merge-diff library to remove those changes and for that we apply the following logic:

  1. Compare the resource live state with pls
  2. Loop over all existing field managers in pls to identify fields mutated that aren't owned by any manager.
    2.1. For each manager we do: comparison.Removed = comparison.Removed.Difference(managedFieldSet)
  3. Finally we revert the mutation webhook changes with something like:
liveRmValues := typedLive.ExtractItems(comparison.Removed)
// revert removed fields not owned by any manager
typedPredictedLive, err = typedPredictedLive.Merge(liveRmValues)

@andrii-korotkov-verkada is working on a bug where the last Merge() call fails with:
associative list with keys has an element that omits key field "name" (and doesn't have default value)

During his investigation, he found the issue with the key field absent in the comparison.Removed after calling comparison.Removed.Difference().

I don't exclude the fact that maybe we are misusing the library somehow but we didn't find anything too obvious. Any help is greatly appreciated.

@jpbetz
Copy link
Contributor

jpbetz commented Dec 4, 2024

I have concerns with the soundness of trying to undo mutating admission via analysis of managed fields, but setting that aside for a moment..

@yongruilin has volunteered to look into the merge problem encountered here.

@yongruilin
Copy link
Contributor

IMO, first, the comparison.Removed should not include the "name" field since it is not "removed". Then, it is because the liveRmValues is a extracted based on the fieldPath of comparison.Removed, the "name" as a result is absent in the liveRmValues.

I propose to extend the ExtractItems with an option to append the key field values to the extracted result, I can draft a PR to demonstrate.

Thoughts?

@andrii-korotkov-verkada
Copy link
Author

That would work, but I wonder whether we can update Merge to look at key field values as parts of paths.

@yongruilin
Copy link
Contributor

It is the merging between 2 typed.TypedValue. My understanding is that, the extracted liveRmValues doesn't have the key field values, does it? If so, it is valid to have the error associative list with keys has an element that omits key field ....

@andrii-korotkov-verkada
Copy link
Author

From my exploration the paths would be like

.spec.template.spec.containers[name="nginx"].terminationMessagePath

So even if

.spec.template.spec.containers[name="nginx"].name

is not present, we can extract name = "nginx" from other path elements.

@yongruilin
Copy link
Contributor

Yes, the fieldPath.Set used to extract has the key field value name="nginx", but not the extracted result (typedValue). That's why I propose to extend the ExtractItems() function to have an option to append the key fields to the input, so that extracted result would have that key filed .name.

@andrii-korotkov-verkada
Copy link
Author

Ah, I see. Yeah, I guess we can go with an extra option for ExtractItems.

@yongruilin
Copy link
Contributor

/assign

@yongruilin
Copy link
Contributor

Proposed with PR: #275

@leoluz
Copy link

leoluz commented Dec 10, 2024

Off topic:

I have concerns with the soundness of trying to undo mutating admission via analysis of managed fields

@jpbetz I am also concerned with this strategy. This is a beta feature in Argo CD (disabled by default) and we are experimenting with it.

@leoluz
Copy link

leoluz commented Dec 12, 2024

We got the confirmation that that the patch provided by @yongruilin addresses the issue.
Thank you everyone!
This issue can be closed.

@yongruilin
Copy link
Contributor

/close

@k8s-ci-robot
Copy link
Contributor

@yongruilin: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants