Skip to content

Make retry for clusterctl check for cert-manager API ready configurable #11960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
viveksyngh opened this issue Mar 12, 2025 · 3 comments · May be fixed by #12055
Open

Make retry for clusterctl check for cert-manager API ready configurable #11960

viveksyngh opened this issue Mar 12, 2025 · 3 comments · May be fixed by #12055
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@viveksyngh
Copy link

viveksyngh commented Mar 12, 2025

What would you like to be added (User Story)?

We have some bits in the clusterctl init which checks for cert-manager API to be ready.

if err := cm.waitForAPIReady(ctx, false); err == nil {
log.Info("Skipping installing cert-manager as it is already installed")
return nil
}

The checks uses waitForAPIReady which is called without retry which means it just check for the API to be ready only once and if it is not ready then it installs a new cert-manger instance. We might have cert-manager already installed but may be experiencing some intermittent issue which can be avoided with retry.

// waitForAPIReady will attempt to create the cert-manager 'test assets' (i.e. a basic
// Issuer and Certificate).
// This ensures that the Kubernetes apiserver is ready to serve resources within the
// cert-manager API group.
// If retry is true, the createObj call will be retried if it fails. Otherwise, the
// 'create' operations will only be attempted once.
func (cm *certManagerClient) waitForAPIReady(ctx context.Context, retry bool) error {

Would like to add a flag (default to false) which can enable the retry of this particular check.

Detailed Description

Add a new flag in clusterctl init command to retry the check for cert-manager API readiness. This flag can have a default value to false which will keep the default behaviour but can be overridden.

Anything else you would like to add?

No response

Label(s) to be applied

/kind feature
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 12, 2025
@viveksyngh viveksyngh changed the title Make retry for clusterctl check for cert-manager API ready configurable Make retry for clusterctl check for cert-manager API ready configurable Mar 12, 2025
@chrischdi
Copy link
Member

/kind feature
/triage accepted
/priority backlog

/help

Short-term: We would like to change waitForAPIReady to use a one second timeout and 250ms interval when retry is set to false to help on flakes (to deal with single failing requests).
PRs are very welcome!

Note: this won't help when cert-manager installation is not completed before running clusterctl.

Long-term: We would maybe like to split "detection if cert-manager is installed" and "detection if cert-manager is properly working".
Example: detection if it is installed could be done by checking existence of the relevant CRDs.
Another alternative would be to have a flag to opt-out of cert-manager installation entirely.

@k8s-ci-robot
Copy link
Contributor

@chrischdi:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/kind feature
/triage accepted
/priority backlog

/help

Short-term: We would like to change waitForAPIReady to use a one second timeout and 250ms interval when retry is set to false to help on flakes (to deal with single failing requests).
PRs are very welcome!

Note: this won't help when cert-manager installation is not completed before running clusterctl.

Long-term: We would maybe like to split "detection if cert-manager is installed" and "detection if cert-manager is properly working".
Example: detection if it is installed could be done by checking existence of the relevant CRDs.
Another alternative would be to have a flag to opt-out of cert-manager installation entirely.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. labels Mar 25, 2025
azych added a commit to azych/cluster-api that referenced this issue Apr 3, 2025
This introduces a new clusterctl init flag
"--retry-cert-manager-readiness-check" that allows to retry
the check for an already installed cert-manager, which
by default is only attempted once before a new cert-manager
installation is started.

When enabled, cert-manager readiness check will be retried
for the duration specified in clusterctl config file's
cert-manager.timeout entry or for a default timeout.

See: kubernetes-sigs#11960
@azych
Copy link

azych commented Apr 3, 2025

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants