Skip to content

Latest commit

 

History

History
505 lines (362 loc) · 19 KB

File metadata and controls

505 lines (362 loc) · 19 KB

KEP-2799: Reduction of Secret-based Service Account Tokens

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

  • (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
  • (R) KEP approvers have approved the KEP status as implementable
  • (R) Design details are appropriately documented
  • (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
  • (R) Graduation criteria is in place
  • (R) Production readiness review completed
  • (R) Production readiness review approved
  • "Implementation History" section is up-to-date for milestone
  • User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
  • Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

This KEP proposes actions to reduce the surface area of secret-based service account tokens.

Motivation

As BoundServiceAccountTokenVolume is GA in 1.22, pods’ service account tokens would be obtained via TokenRequest API and stored as projected volume. This change obviates the need for auto-generation of secret-based service account tokens which are less secure than the bound token.

Goals

  • No auto-generation of secret-based service account token.
  • Removal of unused auto-generated secret-based service account tokens

Non-Goals

Proposal

  • Change the service account control loop in Token Controller to not auto-create secret for service accounts. At the same time, warn usage of auto-created secret-based service account tokens and encourage users to use TokenRequest API or manually-created secret-based service account tokens.
  • Purge unused auto-generated secret-based service account tokens.

User Stories (Optional)

Notes/Constraints/Caveats

  • A warning mechanism should be implemented to help users migrate.
  • Auto generated secret-based service account tokens are those requested by Token Controller.
  • Only clean up auto-generated tokens which:
    • are not referenced by pods
    • have not been used to authenticate for some duration (time duration or number of releases)
  • To consult active usage of secret-based tokens, metric serviceaccount_legacy_tokens_total or audit annotation authentication.k8s.io/legacy-token could be used.

Risks and Mitigations

  • When feature LegacyServiceAccountTokenNoAutoGeneration is Beta, consumers depending directly on waiting for and reading tokens out of auto-generated secrets might stop working. To mitigate,
    1. Emit warnings when using auto-generated token secrets.
    2. Publish pointers to TokenRequest or the manual secret request flow.
  • When LegacyServiceAccountTokenCleanUp is Beta, usage of auto-generated secret-based token might stop working. To mitigate,
    1. When Alpha, annouce the cleanup starts at Beta
    2. Emit warnings when using auto-generated token secrets.
    3. Add pointers of TokenRequest API and manually created tokens in the validation result.
    4. Marked the auto-generated tokens as invalid if they are not used for more than the duration configured by --legacy-service-account-token-clean-up-period (one year by default). And allow the users to re-activate the invalid auto-generated tokens within the duration of --legacy-service-account-token-clean-up-period before the tokens are finally deleted.

Design Details

LegacyServiceAccountTokenNoAutoGeneration:

Token Controller stops auto-creating secret for service accounts. This feature would be enabled when it is implemented since no new code is added and this can make sure new clusters are in good state.

LegacyServiceAccountTokenTracking

To facilitate LegacyServiceAccountTokenCleanUp, we implement a simple controller in kube-apiserver that maintains a bool value configmap kube-apiserver-legacy-service-account-token-tracking in kube-system to indicates if tracking is enabled in the cluster. It is similar to the existing ClusterAuthenticationTrustController that maintains configmap/extension-apiserver-authentication.

  • When LegacyServiceAccountTokenTracking is enabled in all apiservers,

    • the controller creates/updates the configmap kube-apiserver-legacy-service-account-token-tracking in kube-system namespace that stores the current date as since.
    • when a legacy token is used, issue a warning, update the label kubernetes.io/legacy-token-last-used on the secret at date granularity, and record in a metric.
  • When LegacyServiceAccountTokenTracking is disabled in any apiserver,

    • the controller ensures the configmap in kube-system namespace is deleted in a periodic way.

LegacyServiceAccountTokenCleanUp

Token Controller starts to remove unused auto-generated secrets (secrets bi-directionally referenced by the service account) and not mounted by pods.

When this feature is Beta and enabled by default, mark the secrets as invalid iff it is over a sufficient period of time (one year by default) since last used. The period can be configured by cluster admins.

Determine the date that a given secret was last used:

  1. kubernetes.io/legacy-token-last-used if exists and after since stored in the configmap kube-apiserver-legacy-service-account-token-tracking.
  2. defaults to since

If kube-apiserver-legacy-service-account-token-tracking is unavailable, no secret would be removed.

Mark the secrets as invalid and recover:

  1. The secrets will be added a label kubernetes.io/legacy-token-invalid-since, with the date as value.
  2. If the users use the invalid tokens, in the Validate() function of "kubernetes/pkg/serviceaccount/legacy.go", it will detect the usage of invalid tokens and return the error information, telling the users to re-activate the token by updating the label value or use the tokenrequest. At the same time, the tokens will be updated with the new kubernetes.io/legacy-token-last-used date.
  3. If the users don't use the invalid tokens, after the duration configured through --legacy-service-account-token-clean-up-period (one year by default) since the tokens are marked as invalid, the tokens will be finally deleted.

Test Plan

[X] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Prerequisite testing updates

None

Unit tests
  • k8s.io/kubernetes/pkg/controller/serviceaccount: 2022-06-13 - 67.5%
Integration tests
  • Previously auto-generated secret-based token that's used within the configurable cleanup duration will continue to work.
  • Previously auto-generated secret-based token that's used after the configurable cleanup duration will be deleted.
e2e tests
  • Secret-based tokens would not be auto-generated.
  • Still able to explicitly request a secret-based token.
  • The explicitly requested token would not be deleted.

Graduation Criteria

LegacyServiceAccountTokenNoAutoGeneration

Alpha Beta GA
- 1.24 1.26

Since in 1.24, all pods should be admitted in 1.22+ and they should be using bound tokens. One release ahead to enable this features would help to reduce legacy tokens for security practices.

Beta -> GA Graduation

  • Approved by PRR and scalability
  • Any known bugs fixed
  • Tests passing

Alpha -> Beta Graduation

  • Approved by PRR and scalability
  • Any known bugs fixed
  • Tests passing
  • Document and communicate the available actions that consumers of auto-generated secret-based tokens should take. (migrate to either use tokenrequest or explicitly request secret-based tokens)

LegacyServiceAccountTokenTracking

Alpha Beta GA
1.26 1.27 1.28

Beta -> GA Graduation

  • In use by multiple distributions
    • Google
    • RedHat
  • Approved by PRR and scalability
  • Any known bugs fixed
  • Tests passing

Alpha -> Beta Graduation

  • Approved by PRR and scalability
  • Any known bugs fixed
  • Tests passing

LegacyServiceAccountTokenCleanUp

Alpha Beta GA
1.28 1.29 1.30

Beta -> GA Graduation

  • In use by multiple distributions
  • Approved by PRR and scalability
  • Any known bugs fixed
  • Tests passing

Alpha -> Beta Graduation

  • Approved by PRR and scalability
  • Any known bugs fixed
  • Tests passing

Upgrade / Downgrade Strategy

The features can be enabled/disabled via the feature gates in upgrade / downgrade. What would be changed is described in "Feature Enablement and Rollback" section.

Version Skew Strategy

The only touches control plane, so version skew strategy is not applicable.

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?
  • Feature gate (also fill in values in kep.yaml)
    • Feature gate name: LegacyServiceAccountTokenNoAutoGeneration
    • Components depending on the feature gate: kube-controller-manager
    • Feature gate name: LegacyServiceAccountTokenTracking
    • Components depending on the feature gate: kube-apiserver
    • Feature gate name: LegacyServiceAccountTokenCleanUp:
    • Components depending on the feature gate: kube-controller-manager
Does enabling the feature change any default behavior?
  • LegacyServiceAccountTokenNoAutoGeneration: no legacy tokens are auto-generated.
  • LegacyServiceAccountTokenTracking: legacy tokens would have new label and a configmap would be created in kube-system.
  • LegacyServiceAccountTokenCleanUp: unused auto-generated legacy tokens will be removed.
Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

yes for all feature gates.

What happens if we reenable the feature if it was previously rolled back?
  • LegacyServiceAccountTokenNoAutoGeneration: the same as enable the feature. before the reenablement, Token Controller would create tokens for serviceaccounts while the feature was off.
  • LegacyServiceAccountTokenTracking: during this sequence of operations, only the label kubernetes.io/legacy-token-last-used is persisted, but there is no impact on the functionality of this feature.
  • LegacyServiceAccountTokenCleanUp: the same as enable the feature.
Are there any tests for feature enablement/disablement?

yes for all feature gates, covered by integration tests.

Rollout, Upgrade and Rollback Planning

How can a rollout fail? Can it impact already running workloads?
  • LegacyServiceAccountTokenNoAutoGeneration: workloads that expect new auto-created secrets and extract tokens from them would fail.
  • LegacyServiceAccountTokenTracking: no impact.
  • LegacyServiceAccountTokenCleanUp: workloads that reads auto-generated secrets after those secrets being considered unused by this feature and removed.
What specific metrics should inform a rollback?

serviceaccount_legacy_tokens_total: cumulative stale service account tokens used.

this metric is only informational and cannot deterministically tell a rollback is needed. there is no good way for us to detect scrapers of auto-generated secrets.

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

no since there is not much change between a upgrade and upgrade->downgrade->upgrade. see section What happens if we reenable the feature if it was previously rolled back.

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

no

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

check if there is a configmap kube-apiserver-legacy-service-account-token-tracking in namespace kube-system.

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
  • Metrics
    • Metric name: serviceaccount_legacy_tokens_total
    • [Optional] Aggregation method:
    • Components exposing the metric: kube-apiserver

LegacyServiceAccountTokenNoAutoGeneration and LegacyServiceAccountTokenCleanUp might cause few workloads to fail but there is no way for us to inject metric in workloads to detect this.

What are the reasonable SLOs (Service Level Objectives) for the above SLIs?

none. we expect the number recorded in the above metric going down in the long term.

Are there any missing metrics that would be useful to have to improve observability of this feature?

none.

Dependencies

Does this feature depend on any specific services running in the cluster?

no.

Scalability

Will enabling / using this feature result in any new API calls?

up to one additional write request per day could be made to auto-generated secrets still in use.

Will enabling / using this feature result in introducing new API types?

no.

Will enabling / using this feature result in any new calls to the cloud provider?

no.

Will enabling / using this feature result in increasing size or count of the existing API objects?

no. instead, use of the feature reduces the number of API objects.

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

no.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

no.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?
  • kube-apiserver-legacy-service-account-token-tracking configmap cannout be created.
  • unable to remove unused auto-generated secrets.
What are other known failure modes?
  • failure to create kube-apiserver-legacy-service-account-token-tracking config map
    • Detection: check if kube-apiserver-legacy-service-account-token-tracking exists in kube-system
    • Mitigations: there is no impact on existing systems.
    • Diagnostics: check kube-apiserver log.
    • Testing: TBD.
What steps should be taken if SLOs are not being met to determine the problem?

n/a.

Implementation History

Drawbacks

Alternatives

Infrastructure Needed (Optional)