Skip to content

Latest commit

 

History

History
614 lines (467 loc) · 27.6 KB

File metadata and controls

614 lines (467 loc) · 27.6 KB

KEP-1933: Defend Against Logging Secrets via Static Analysis

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

  • (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
  • (R) KEP approvers have approved the KEP status as implementable
  • (R) Design details are appropriately documented
  • (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
  • (R) Graduation criteria is in place
  • (R) Production readiness review completed
  • Production readiness review approved
  • "Implementation History" section is up-to-date for milestone
  • User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
  • Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

Taint propagation analysis can provide insight into how data spreads and is consumed within a program. It can be use used to harden the boundaries for those data which require special handling.

This Kubernetes Enhancement Proposal (KEP) proposes such analysis to be used during testing to prevent various types of sensitive information from leaking via logs.

Motivation

The 2019 Trail of Bits security audit contained several issues centered on the exposure of secrets to logs or execution environment.

  • Trail of Bits issue 6, Issue #81114: Bearer tokens are revealed in logs.
  • Trail of Bits issue 9, Issue #81117 :Environment variables expose sensitive data.
  • Trail of Bits issue 22. Issue #81130: iSCSI volume storage cleartext secrets in logs.

In light of these issues, the audit authors' long-term suggestions regarding secret management includes:

Ensure that sensitive data cannot be trivially stored in logs. Prevent dangerous logging actions with improved code review policies. Redact sensitive information with logging filters. Together, these actions can help to prevent sensitive data from being exposed in the logs.

This KEP represents part of "improved code review policies."

Goals

  • Detect logging of secrets during testing.
  • Block pull requests that would introduce such violations.

Non-Goals

  • Detect all instances of potential secret logging.
  • Replace meaningful review of how secrets are handled.

Proposal

go-flow-levee is a highly configurable taint propagation analysis tool for Go code. Independent use successfully identified a potential risk of secret logging, remedied in PR #90413. This KEP proposes that go-flow-levee be used as part of Prow's testing of pull requests to limit the introduction of new secret logging.

As a taint propagation analysis tool, go-flow-levee examines potential code paths of data sources and emits a report if any reaches a sink. In use within Kubernetes, a source would consist of any field that contains a secret, e.g., iscsi.iscsiDisk.secret, or particular method calls that would return a secret, e.g., os.Getenv("CFSSL_CA_PK_PASSWORD"). A sink consists of any klog logging method.

Taint propagation analysis gives additional consideration is given to how data may be transformed or extracted to other values via propagators, and how source data may be processed for safe consumption via sanitizers. See the go-flow-levee documentation for details.

While configuration of source identification can be done via manually configured regexp, this KEP benefits from a set of standard Kubernetes go lang struct tags indicating which fields are expected to contain secrets. See KEP-1753 for more information on data policy tags.

Notes/Constraints/Caveats

go-flow-levee is in active development and has not yet reached a v1.0 release. It is, however, being developed with specific interest in application to Kubernetes.

Risks and Mitigations

While the test will not initially be blocking, if it produces many false positives, it may train developers to ignore issues. Additionally, because it is not initially blocking, it risks being overlooked as unimportant even when findings are relevant.

Should issues reach a branch, either by the happenstance of a merge, overridden warnings, or analysis flakiness, reported findings may include those out of scope for the change set in a given PR. Such incidents would provide confusion and toil for developers, but could be quickly corrected, suppressed via configuration, or the offending commit reverted.

Changes to test-infra carry with them the potential for inconvenience, should they introduce any instability to wider testing. While diligent review mitigates this, it does not remove the concern completely.

Design Details

In-depth analysis interacts with build artifacts, and as a result is quickly stymied by Bazel's sandboxing of build and test environments. It is similar in this regard to linters, and indeed a standard execution path for analysis in Go is via go vet -vettool=....

This KEP would introduce a execution script to hack/ which handles passing the analyzer and config to our vet process. Within Kubernetes, go vet is executed via make vet. The analyzer can be passed to this in the introduced hack/ script via make vet WHAT="...". Dependency on go-flow-levee will be introduced to hack/tools to avoid polluting the main Kubernetes dependencies.

This KEP will also introduce execution of this script as a Prow presubmit job. Any findings by the analysis will constitute a test failure. Failure will be initially non-blocking, per the graduation criteria below. Analysis findings reporting the position of both source and sink, indicating to a developer which log call must be corrected and/or which argument must be sanitized.

Test Plan

As a testing target, testing will consist of examples of Kubernetes-specific cases that we expect static analysis to detect. No full-scale Kubernetes integration/e2e tests will be necessary.

For analysis based on go-flow-levee, this will consist of sample Kubernetes code based on the same configuration used in presubmit scanning. This test will be examined via the analysistest package to ensure analysis produces expected diagnostics. As part of testing of our testing process, these tests should belong to kubernetes/test-infra.

Graduation Criteria

Alpha (1.20)

  • Analysis is manually triggered to run in Prow against a sampling of PRs.

Alpha -> Beta Graduation

  • Test is validated as running soundly.
  • Analysis configuration consumes fields tags introduced by KEP-1753 for identification of material that should not be logged. Configuration should also identify other sensitive data not bound to fields, if any are identified during KEP-1753's investigation.

Beta

  • Analysis runs as a non-blocking presubmit check, warning developers of any findings in their changes.

Beta -> Stable Graduation

  • Test is validated as running soundly at scale.
  • No false positives, test failures, or other concerning issues are raised for 1-2 weeks.

Stable

  • Analysis runs as a blocking presubmit test.

Upgrade / Downgrade Strategy

Version Skew Strategy

Production Readiness Review Questionnaire

Feature Enablement and Rollback

As part of Prow, enablement is managed by configuration in kubernetes/test-infra. As the test target and tool version are fixed in kubernetes/kubernetes/hack/tools, rollback can be handled by reverting any offending commit to hack/tools.

Rollout, Upgrade and Rollback Planning

As a third-party dependency, analyzer upgrading is handled by upgrading the version targeted by kubernetes/kubernetes/hack/tools/. Tool configuration at kubernetes/kubernetes/hack/testdata/levee can be updated independently, though may be required during tool upgrading.

Monitoring Requirements

As a Prow job, visibility is provided by existing Prow monitoring via dashboards and alerts.

Dependencies

This KEP introduces a third-party dependency on github.com/google/go-flow-levee.

Scalability

For alpha, this section is encouraged: reviewers should consider these questions and attempt to answer them.

For beta, this section is required: reviewers must answer these questions.

For GA, this section is required: approvers should be able to confirm the previous answers based on experience in the field.

  • Will enabling / using this feature result in any new API calls?

The Prow dashboard will serve an additional test dashboard.

  • Will enabling / using this feature result in introducing new API types?

No.

  • Will enabling / using this feature result in any new calls to the cloud provider?

No.

  • Will enabling / using this feature result in increasing size or count of the existing API objects?

With the additional dashboard page, there may be a marginal increase in traffic served.

  • Will enabling / using this feature result in increasing time taken by any operations covered by [existing SLIs/SLOs]?

While adding a new test will increase cycle-time taken to perform taken, parallelization of test tasks will prevent increase in wall-time. At time of writing, analysis of Kubernetes takes ~5 minutes.

  • Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? Things to keep in mind include: additional in-memory state, additional non-trivial computations, excessive access to disks (including increased log volume), significant amount of data sent and/or received over network, etc. This through this both in small and large cases, again with respect to the [supported limits].

There may be non-negligible increase in testing resource usage. While analysis is much simpler than tests that require a cluster for testing, implementation should benchmark to estimate actual computational costs of analysis.

Troubleshooting

While this KEP does not change cluster behavior, any reported finding should be communicated clearly such that developer correction can proceed as smoothly as possible. During non-blocking release stages, this should include instructions for reporting false-positives if the PR author believes the findings are incorrect. During blocking release stages, this should include instructions for escalating possible false-positives to avoid blocking other PRs and how to contact contributors with /override permissions to approve bypass of analysis.

Assistance in resolving issues identified by the analyzer can be found in the Verification Tests Documentation

Analyzer failures or bugs should be reported to go-flow-levee Issues.

Implementation History

Drawbacks

As a blocking test, there is a risk for developer toil in the event of any false-positive or test flakiness. This can be mitigated by any contributor with /override permissions.

In the unexpected event that Prow-bot merges two PR without first rebasing one to the HEAD of the target branch, it could be possible for an analysis violation to reach a given branch. Like any other failing test that could reach master, all subsequent PRs would be blocked by spurious failure. This could be mitigated if analysis first executes a baseline against the target branch without the changes introduced by a PR. However, such additional testing has not proven necessary given the rarity of both such Prow-bot misbehavior and the sort of PR diffs necessary to introduce a new violation.

As this analysis depends on project-specific considerations of what constitutes a secret or a sink, periodic review is required to ensure configuration is kept up-to-date. This is mitigated somewhat with a consistent use of field tags, as introduced in KEP-1753.

Alternatives

GitHub's CodeQL includes taint analysis and permits general SSA graph queries. While CodeQL may provide similar testing, its own documentation indicates that any findings would not be blocking. Given the intended scope of this KEP as a means to block potential security concerns, blocking on detection is of heightened interest. CodeQL could be used to augment coverage in the future, however.

While other static analysis tools exist for Go, these tend towards more general linters. gosec, for instance, can be used to detect hard-coded tokens or use of cryptographically broken packages, e.g., crypto/md5. However, such linters are insufficient for our use, as they do not allow for project-specific configuration to identify sources, sinks, etc. Additionally, linters like gosec or even go vet can change their specification with new versions, resulting in new findings breaking CI.

There are few other developer analyzers that provide depth greater than a linter. gotcha, the go taint checker analyzer, provides a basic taint propagation analysis. However, it does not provide the ability to specify sanitizers or other fine-tuning.

Infrastructure Needed (Optional)

This KEP will introduce a new Prow test. No additional infrastructure beyond that which already exists should be necessary.