Skip to content
This repository was archived by the owner on Apr 17, 2025. It is now read-only.

Resources status should be compatible with kstatus #142

Closed
erikgb opened this issue Feb 2, 2022 · 19 comments
Closed

Resources status should be compatible with kstatus #142

erikgb opened this issue Feb 2, 2022 · 19 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@erikgb
Copy link
Contributor

erikgb commented Feb 2, 2022

As a user of tools like FluxCD, I would like Flux to know if/when my HNC resources are fully reconciled and operational (ready/healthy). Flux supports health assessment for all/most in-tree resource types, but it turns out that Flux also can support custom resources - if the resource expose a status compatible with kstatus.

I think this should be fairly easy to implement, and I would be happy to submit a PR to fix it. But we need to decide what to do with the current content of resource statuses? Can we delete something (breaking), or should we avoid breaking changes and keep duplicated status content with the same meaning?

@adrianludwin
Copy link
Contributor

adrianludwin commented Feb 7, 2022

I think our Condition struct is compatible with the standard (see here) so that's a good start.

As for replacing the existing conditions, the issue I have is that HierarchyConfiguration currently has one Condition that means "this entire namespace is in a bad state" (ActivitiesHalted) and another that means "something's gone wrong but it's not a disaster" (BadConfiguration). I think kstatus would lump these both into Failed, and I don't particularly want to lose the existing distinction. But I'm totally on board with adding a kstatus Stalled condition if either of the above are true. (I'd rather not use the Ready=true/false condition since namespaces aren't exactly ever "ready").

Similarly, HNCConfiguration has three conditions: BadConfiguration, OutOfSync and the poorly named NamespaceCondition (there's a problem in a namespace in HNC). Similarly, I think kstatus would map the first two of these to Stalled. The third one is more informational and doesn't have anything to do with the HNCConfiguration objects itself so probably doesn't need a kstatus-compliant condition.

How does that sound?

@erikgb
Copy link
Contributor Author

erikgb commented Feb 7, 2022

I think it could make sense to keep the existing conditions when the HNC resources are not fully reconciled. AFAIK the most important requirements in kstatus seem to be:

  • Add a new field: status.observedGeneration to be able to detect when the controller is aware of the resource state, and has started working with it.
  • Any condition (except Ready: true) means that the controller is either working on making the actual state equal to the desired state, or an error trying to do so. We might have to add a "Reconciling" condition if that is not yet present in HNC. Any resource with status.observedGeneration equal to metadata.generation and no conditions, will be considered ready and operational.

@adrianludwin
Copy link
Contributor

I thought kstatus ignored any condition it didn't recognize. The only conditions is recognizes are Ready, Stalled and Reconciling. Is that not correct? (If not, can you point me to the line of code where it looks for those?)

HNC doesn't actually know when anything finishes reconciling except SubnamespaceAnchor, so we could add it there. So adding that would be a very large task, not just a minor API reporting convention change.

I'm ok with adding status.observedGeneration of course.

@erikgb
Copy link
Contributor Author

erikgb commented Feb 7, 2022

The Ready condition is a bit special, and I understand the other conditions/statuses as just recommendations. The most important aspect is noted in the docs:

The conditions defined in the library are designed to adhere to the "abnormal-true" polarity pattern (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties) , i.e. that conditions are present and with a value of true whenever something unusual happens. So the absence of any conditions means everything is normal. Normal in this situation simply means that the latest observed generation of the resource manifest by the controller have been fully reconciled with the actual state.

@adrianludwin
Copy link
Contributor

Understood, but the actual code (here and here) seems to ignore any condition except one of those three (except for K8s built-in objects).

Anyway, I'd be in favour of adding a Reconciling condition to the Anchor while it's waiting for the subnamespace to be created, and to adding a Stalled condition to the HierarchyConfiguration and HNCConfiguration (I'd also be ok with Ready: False, but it seems like this is less preferred). I don't want to add a Reconciling condition to the *Configuration objects since we don't actually have the ability to set that accurately, at least not right now, and possibly not ever (the observedGeneration trick doesn't help when the reconciler's actually working on a completely different object).

@erikgb
Copy link
Contributor Author

erikgb commented Apr 12, 2022

@adrianludwin I'll like to push this forward. This is what I suggest to do, in the following order (and as separate pull requests):

  1. Introduce status observedGeneration on all HNC custom resources that are watched/reconciled.
  2. Introduce conditions in SubnamespaceAnchor status, since conditions are missing completely now.
  3. Add new Stalled condition to HNCConfiguration and HierarchyConfiguration in case anything is wrong. This will complement the existing conditions - at least for now.

When this is done, I think the resources are compatible with kstatus, and this issue can be closed.

But depending on if/when we can break the current status API, there are two tasks that I would suggest:

  1. Remove the redundant SubnamespaceAnchor status.status, and instead use the suggested sematics in kstatus.
  2. Remove the IMO now redundant existing conditions, and rely on the reason for further details. I might have overlooked something, but there seems to be exactly one condition type for each reason in HNC.

@adrianludwin
Copy link
Contributor

Yes, each reason has exactly one condition type, so I believe they could be removed. The bigger problem is that some of these conditions are basically warnings, and others are errors. Can kstatus represent that? Or can we add a "Warning" condition that gets ignored by kstatus (see my last comment from Feb 7)?

Are you ok with never being able to add a Reconciling condition to HierarchyConfiguration? We have no ability to set or unset that right now, and may never have that ability. E.g. is a namespace finished reconciling when we've guaranteed that the structure is correct, or when all objects have been propagated into it, or when all objects have been propagated out of it? And how do we feasibly implement that?

What does observedGeneration mean in the context of HierarchyConfiguration? An HC may need to be reconciled due to a change in a different namespace so just because observed generation == actual generation doesn't mean that we're in sync. I mean, we can add it to make kstatus and Flux happy, but it won't give the assurances that I think it's supposed to give.

It might be good to collect all this in a Google doc we can iterate on rather than leaving it in a discussion format in this issue. wdyt?

@adrianludwin
Copy link
Contributor

... with all that said, adding all this to the existing API sgtm, and then when we next bump the API to v1beta1, we can drop the custom fields.

@erikgb
Copy link
Contributor Author

erikgb commented Apr 13, 2022

/assign

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 12, 2022
@erikgb
Copy link
Contributor Author

erikgb commented Jul 12, 2022

Still valid, to support CI/CD tools like Flux and Argo.

@erikgb
Copy link
Contributor Author

erikgb commented Jul 12, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 12, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 10, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 9, 2022
@erikgb
Copy link
Contributor Author

erikgb commented Nov 9, 2022

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 9, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 7, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 9, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants