Resources status should be compatible with kstatus #142

erikgb · 2022-02-02T22:38:11Z

As a user of tools like FluxCD, I would like Flux to know if/when my HNC resources are fully reconciled and operational (ready/healthy). Flux supports health assessment for all/most in-tree resource types, but it turns out that Flux also can support custom resources - if the resource expose a status compatible with kstatus.

I think this should be fairly easy to implement, and I would be happy to submit a PR to fix it. But we need to decide what to do with the current content of resource statuses? Can we delete something (breaking), or should we avoid breaking changes and keep duplicated status content with the same meaning?

adrianludwin · 2022-02-07T02:09:35Z

I think our Condition struct is compatible with the standard (see here) so that's a good start.

As for replacing the existing conditions, the issue I have is that HierarchyConfiguration currently has one Condition that means "this entire namespace is in a bad state" (ActivitiesHalted) and another that means "something's gone wrong but it's not a disaster" (BadConfiguration). I think kstatus would lump these both into Failed, and I don't particularly want to lose the existing distinction. But I'm totally on board with adding a kstatus Stalled condition if either of the above are true. (I'd rather not use the Ready=true/false condition since namespaces aren't exactly ever "ready").

Similarly, HNCConfiguration has three conditions: BadConfiguration, OutOfSync and the poorly named NamespaceCondition (there's a problem in a namespace in HNC). Similarly, I think kstatus would map the first two of these to Stalled. The third one is more informational and doesn't have anything to do with the HNCConfiguration objects itself so probably doesn't need a kstatus-compliant condition.

How does that sound?

erikgb · 2022-02-07T18:43:26Z

I think it could make sense to keep the existing conditions when the HNC resources are not fully reconciled. AFAIK the most important requirements in kstatus seem to be:

Add a new field: status.observedGeneration to be able to detect when the controller is aware of the resource state, and has started working with it.
Any condition (except Ready: true) means that the controller is either working on making the actual state equal to the desired state, or an error trying to do so. We might have to add a "Reconciling" condition if that is not yet present in HNC. Any resource with status.observedGeneration equal to metadata.generation and no conditions, will be considered ready and operational.

adrianludwin · 2022-02-07T18:49:48Z

I thought kstatus ignored any condition it didn't recognize. The only conditions is recognizes are Ready, Stalled and Reconciling. Is that not correct? (If not, can you point me to the line of code where it looks for those?)

HNC doesn't actually know when anything finishes reconciling except SubnamespaceAnchor, so we could add it there. So adding that would be a very large task, not just a minor API reporting convention change.

I'm ok with adding status.observedGeneration of course.

erikgb · 2022-02-07T19:13:10Z

The Ready condition is a bit special, and I understand the other conditions/statuses as just recommendations. The most important aspect is noted in the docs:

The conditions defined in the library are designed to adhere to the "abnormal-true" polarity pattern (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties) , i.e. that conditions are present and with a value of true whenever something unusual happens. So the absence of any conditions means everything is normal. Normal in this situation simply means that the latest observed generation of the resource manifest by the controller have been fully reconciled with the actual state.

adrianludwin · 2022-02-07T20:51:23Z

Understood, but the actual code (here and here) seems to ignore any condition except one of those three (except for K8s built-in objects).

Anyway, I'd be in favour of adding a Reconciling condition to the Anchor while it's waiting for the subnamespace to be created, and to adding a Stalled condition to the HierarchyConfiguration and HNCConfiguration (I'd also be ok with Ready: False, but it seems like this is less preferred). I don't want to add a Reconciling condition to the *Configuration objects since we don't actually have the ability to set that accurately, at least not right now, and possibly not ever (the observedGeneration trick doesn't help when the reconciler's actually working on a completely different object).

erikgb · 2022-04-12T14:01:42Z

@adrianludwin I'll like to push this forward. This is what I suggest to do, in the following order (and as separate pull requests):

Introduce status observedGeneration on all HNC custom resources that are watched/reconciled.
Introduce conditions in SubnamespaceAnchor status, since conditions are missing completely now.
Add new Stalled condition to HNCConfiguration and HierarchyConfiguration in case anything is wrong. This will complement the existing conditions - at least for now.

When this is done, I think the resources are compatible with kstatus, and this issue can be closed.

But depending on if/when we can break the current status API, there are two tasks that I would suggest:

Remove the redundant SubnamespaceAnchor status.status, and instead use the suggested sematics in kstatus.
Remove the IMO now redundant existing conditions, and rely on the reason for further details. I might have overlooked something, but there seems to be exactly one condition type for each reason in HNC.

adrianludwin · 2022-04-12T15:24:07Z

Yes, each reason has exactly one condition type, so I believe they could be removed. The bigger problem is that some of these conditions are basically warnings, and others are errors. Can kstatus represent that? Or can we add a "Warning" condition that gets ignored by kstatus (see my last comment from Feb 7)?

Are you ok with never being able to add a Reconciling condition to HierarchyConfiguration? We have no ability to set or unset that right now, and may never have that ability. E.g. is a namespace finished reconciling when we've guaranteed that the structure is correct, or when all objects have been propagated into it, or when all objects have been propagated out of it? And how do we feasibly implement that?

What does observedGeneration mean in the context of HierarchyConfiguration? An HC may need to be reconciled due to a change in a different namespace so just because observed generation == actual generation doesn't mean that we're in sync. I mean, we can add it to make kstatus and Flux happy, but it won't give the assurances that I think it's supposed to give.

It might be good to collect all this in a Google doc we can iterate on rather than leaving it in a discussion format in this issue. wdyt?

adrianludwin · 2022-04-12T15:24:36Z

... with all that said, adding all this to the existing API sgtm, and then when we next bump the API to v1beta1, we can drop the custom fields.

erikgb · 2022-04-13T10:19:37Z

/assign

k8s-triage-robot · 2022-07-12T11:20:09Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

erikgb · 2022-07-12T11:28:21Z

Still valid, to support CI/CD tools like Flux and Argo.

erikgb · 2022-07-12T11:28:42Z

/remove-lifecycle stale

k8s-triage-robot · 2022-10-10T11:51:15Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-11-09T12:23:29Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

erikgb · 2022-11-09T13:13:46Z

/remove-lifecycle rotten

k8s-triage-robot · 2023-02-07T13:41:57Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-03-09T13:56:48Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2023-04-08T14:31:09Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2023-04-08T14:31:13Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot assigned erikgb Apr 13, 2022

erikgb mentioned this issue Apr 19, 2022

Refactor adding/removing/contains finalizers #203

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 12, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 12, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 10, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 9, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 9, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 7, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 9, 2023

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 8, 2023

Resources status should be compatible with kstatus #142

Resources status should be compatible with kstatus #142

Comments

erikgb commented Feb 2, 2022

adrianludwin commented Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erikgb commented Feb 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrianludwin commented Feb 7, 2022

Uh oh!

erikgb commented Feb 7, 2022

Uh oh!

adrianludwin commented Feb 7, 2022

Uh oh!

erikgb commented Apr 12, 2022

Uh oh!

adrianludwin commented Apr 12, 2022

Uh oh!

adrianludwin commented Apr 12, 2022

Uh oh!

erikgb commented Apr 13, 2022

Uh oh!

k8s-triage-robot commented Jul 12, 2022

Uh oh!

erikgb commented Jul 12, 2022

Uh oh!

erikgb commented Jul 12, 2022

Uh oh!

k8s-triage-robot commented Oct 10, 2022

Uh oh!

k8s-triage-robot commented Nov 9, 2022

Uh oh!

erikgb commented Nov 9, 2022

Uh oh!

k8s-triage-robot commented Feb 7, 2023

Uh oh!

k8s-triage-robot commented Mar 9, 2023

Uh oh!

k8s-triage-robot commented Apr 8, 2023

Uh oh!

k8s-ci-robot commented Apr 8, 2023

Uh oh!

adrianludwin commented Feb 7, 2022 •

edited

Loading

erikgb commented Feb 7, 2022 •

edited

Loading