- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable
- (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- "Implementation History" section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
This Kubernetes Enhancement Proposal (KEP) is to enhance the kubelet to allow tracing gRPC and HTTP API requests. The kubelet is the integration point of a node's operating system and kubernetes. From control plane-node communication documentation, a primary communication path from the control plane to the nodes is from the apiserver to the kubelet process running on each node. The kubelet communicates with the container runtime over gRPC, where kubelet is the gRPC client and the Container Runtime Interface is the gRPC server. The CRI then sends the creation request to the container runtime installed on the node. Traces gathered from the kubelet will provide critical information to monitor and troubleshoot interactions at the node level. This KEP proposes using OpenTelemetry libraries, and exports in the OpenTelemetry format. This is in line with the API Server enhancement.
Along with metrics and logs, traces are a useful form of telemetry to aid with debugging incoming requests. The kubelet can make use of distributed tracing to improve the ease of use and enable easier analysis of trace data. Trace data is structured, providing the detail necessary to debug requests across service boundaries. As more core components are instrumented, Kubernetes becomes easier to monitor, manage, and troubleshoot.
Span: The smallest unit of a trace. It has a start and end time. Spans are the building blocks of a trace.
Trace: A collection of Spans which represents work being done by a system. A record of the path of requests through a system.
Trace Context: A reference to a Trace that is designed to be propagated across component boundaries.
- The kubelet generates and exports spans for reconcile loops it initiates and for incoming/outgoing requests to the kubelet's authenticated http servers, as well as the CRI, CNI, and CSI interfaces. An example of a reconcile loop is the creation of a pod. Pod creation involves pulling the image, creating the pod sandbox and creating the container. With stateful workloads, attachment and mounting of volumes referred by a pod might be included. Trace data can be used to diagnose latency within such flows.
- Tracing in other kubernetes controllers besides the kubelet
- Replace existing logging, metrics
- Change metrics or logging (e.g. to support trace-metric correlation)
- Access control to tracing backends
- Add tracing to components outside kubernetes (e.g. etcd client library).
Since this feature is for diagnosing problems with the kubelet, it is targeted at cluster operators, administrators, and cloud vendors that manage kubernetes nodes and core services.
For the following use-cases, I can deploy an OpenTelemetry collector agent as a DaemonSet to collect kubelet trace data from each node's host network.
From there, OpenTelemetry trace data can be exported to a tracing backend of choice. I can configure the kubelet's optional TracingConfiguration
specified within kubernetes/component-base
with the default URL to make the kubelet send its spans to the collector.
Alternatively, I can point the kubelet at an OpenTelemetry collector listening on a different port or URL if I need to.
As a cluster administrator or cloud provider, I would like to collect gRPC and HTTP trace data from the transactions between the API server and the
kubelet and interactions with a node's container runtime (Container Runtime Interface) to debug cluster problems. I can set the SamplingRatePerMillion
in the configuration file to a non-zero number to have spans collected for a small fraction of requests. Depending on the symptoms I need to
debug, I can search span metadata or specific nodes to find a trace which displays the symptoms I am looking to debug.
The sampling rate for trace exports can be configured based on my needs. I can collect each node's kubelet trace data as distinct tracing services
to diagnose node issues.
- Latency or timeout experienced when:
- Attach or exec to running containers
- APIServer to Kubelet, then Kubelet to CRI can determine which component/operation is the culprit
- Kubelet's port-forwarding functionality
- APIServer, Kubelet, CRI trace data can determine which component/operation is experiencing issues
- Container Start/Stop/Create/Remove
- Kubelet, CRI trace data determines which component/operation is experiencing issues with insight into which is experiencing issues.
- Attachment and mounting of volumes referred by a pod
- Trace data from kubelet's interaction with volume plugins will help to quickly diagnose latency issues.
- Attach or exec to running containers
- Kubelet spans can detect node-level latency - each node produces trace data filterable with node hostname
- Are there issues with particular regions? Cloud providers? Machine-types?
The gRPC client in the kubelet's Container Runtime Interface (CRI) Remote Runtime Service and the CRI streaming package will be instrumented to export trace data and propagate trace context. The Go implementation of OpenTelemetry will be used. An OTLP exporter and an OTLP trace provider along with a context propagator will be configured.
Also, designated http servers and clients will be wrapped with otelhttp to generate spans for sampled incoming requests and propagate context with client requests.
OpenTelemetry-Go provides the propagation package with which you can add custom key-value pairs known as baggage. Baggage data will be propagated across services within contexts.
With the initial implementation of this proposal, kubelet tracing produced disconnected spans, because context was not wired through kubelet CRI calls. With this PR, context is now plumbed between CRI calls and kubelet. It is now possible to connect spans for CRI calls. Nested spans with top-level traces in the kubelet will connect CRI calls together. Nested spans will be created for the following:
- Sync Loops (e.g. syncPod, eviction manager, various gc routines) where the kubelet initiates new work.
- Incoming requests (exec, attach, port-forward, metrics endpoints, podresources)
- Outgoing requests (CNI, CSI, device plugin, k8s API calls)
Although this proposal focuses on running the OpenTelemetry Collector, note that any OTLP-compatible trace collection pipeline can be used to collect OTLP data from the kubelet. As an example, the OpenTelemetry Collector can be run as a sidecar, a daemonset, a deployment , or a combination in which the daemonset buffers telemetry and forwards to the deployment for aggregation (e.g. tail-base sampling) and routing to a telemetry backend. To support these various setups, the kubelet should be able to send traffic either to a local (on the control plane network) collector, or to a cluster service (on the cluster network).
Refer to OpenTelemetry Specification
// KubeletConfiguration contains the configuration for the Kubelet
type KubeletConfiguration struct {
metav1.TypeMeta `json:",inline"`
------------
// +optional
// TracingConfiguration
TracingConfiguration componentbaseconfig.TracingConfiguration
}
NOTE The TracingConfiguration will be defined in kubernetes/component-base
repository
// TracingConfiguration specifies configuration for opentelemetry tracing
type TracingConfiguration struct {
// +optional
// URL of the collector that's running on the node.
// Defaults to 0.0.0.0:4317
URL *string `json:"url,omitempty" protobuf:"bytes,1,opt,name=url"`
// +optional
// SamplingRatePerMillion is the number of samples to collect per million spans.
// Defaults to 0.
SamplingRatePerMillion int32 `json:"samplingRatePerMillion,omitempty" protobuf:"varint,2,opt,name=samplingRatePerMillion"`
}
There is a risk of memory usage incurred by storing spans prior to export. This is mitigated by limiting the number of spans that can be queued for export, and dropping spans if necessary to stay under that limit.
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
An integration test will verify that spans exported by the kubelet match what is expected from the request. We will also add an integration test that verifies spans propagated from kubelet to API server match what is expected from the request.
- https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/config/validation/validation_test.go#L503-#L532
- https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cri/remote/remote_runtime_test.go#L65-#L97
- https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/options/tracing_test.go
- https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/component-base/tracing/api/v1/config_test.go
Integration tests verify that spans exported by the kubelet match what is expected from the request. Also an integration test that verifies spans propagated from kubelet to API server match what is expected from the request.
- component-base tracing/api/v1 integration test https://github.com/kubernetes/kubernetes/blob/master/test/integration/apiserver/tracing/tracing_test.go
- A test with kubelet-tracing & apiserver-tracing enabled to ensure no issues are introduced, regardless of whether a tracing backend is configured.
Alpha
- Implement tracing of incoming and outgoing gRPC, HTTP requests in the kubelet
- Integration testing of tracing
- Unit testing of kubelet tracing and tracing configuration
Beta
- OpenTelemetry reaches GA
- Publish examples of how to use the OT Collector with kubernetes
- Allow time for feedback
- Test and document results of upgrade and rollback while feature-gate is enabled.
- Add top level traces to connect spans in sync loops, incoming requests, and outgoing requests.
- Unit/integration test to verify connected traces in kubelet.
- Revisit the format used to export spans.
- Parity with the old text-based Traces
- Connecting traces from container runtimes via the Container Runtime Interface
GA
- Feedback from users collected and incorporated over multiple releases
Tracing will work if the kubelet version supports the feature, and will not export spans if it doesn't. It does not impact the ability to upgrade or rollback kubelet versions.
Version skew isn't applicable because this feature only involves the kubelet.
- Feature gate (also fill in values in
kep.yaml
)- Feature gate name: KubeletTracing
- Components depending on the feature gate: kubelet
- Other
- Describe the mechanism: KubeletConfiguration TracingConfiguration
- When the
KubeletTracing
feature gate is disabled, the kubelet will:- Not generate spans
- Not initiate an OTLP connection
- Not Propagate context
- When the feature gate is enabled, but no TracingConfiguration is provided, the kubelet will:
- Not generate spans
- Not initiate an OTLP connection
- Propagate context in (https://github.com/open-telemetry/opentelemetry-go/tree/main/example/passthrough#passthrough-setup-for-opentelemetry)[passthrough] mode
- When the feature gate is enabled, and a TracingConfiguration with sampling rate 0 (the default) is provided, the kubelet will:
- Initiate an OTLP connection
- Generate spans only if the incoming (if applicable) trace context has the sampled flag set
- Propagate context normally
- When the feature gate is enabled, and a TracingConfiguration with sampling rate > 0 is provided, the kubelet will:
- Initiate an OTLP connection
- Generate spans at the specified sampling rate, or if the incoming context has the sampled flag set
- Propagate context normally
- When the
- Will enabling / disabling the feature require downtime of the control plane? No. It will require restarting the kubelet service per node.
- Will enabling / disabling the feature require downtime or reprovisioning of a node? No, restarting the kubelet with feature-gate disabled will disable tracing
- Describe the mechanism: KubeletConfiguration TracingConfiguration
No. The feature is disabled unless the feature gate is enabled and the TracingConfiguration is populated in Kubelet Configuration. When the feature is enabled, it doesn't change behavior from the users' perspective; it only adds tracing telemetry.
Yes.
It will start generating and exporting traces again.
Enabling and disabling kubelet tracing is an in-memory switch. Explicit enablement/disablement tests will not provide value so will not be added. Manual testing of disabling, reenabling the feature on nodes, ensuring the kubelet comes up w/out error will be performed and documented.
This section must be completed when targeting beta graduation to a release.
With an improper TracingConfiguration spans will not be exported as expected, No impact to running workloads, logs will indicate the problem.
- This KEP is following the opentelemetry-go issue #2547.
...using the OTLP trace exporter, it isn't currently possible to monitor (with metrics) whether or not spans are being successfully collected and exported.
For example, if my SDK cannot connect to an opentelemetry collector, and isn't able to send traces, I would like to be able to measure how many traces are collected,
vs how many are not sent. I would like to be able to set up SLOs to measure successful trace delivery from my applications.
- Pod Lifecycle and Kubelet SLOs are the signals that should guide a rollback. In particular, the [
kubelet_pod_start_duration_seconds_count
,kubelet_runtime_operations_errors_total
, andkubelet_pleg_relist_interval_seconds_bucket
] would surface issues affecting kubelet performance.
Yes. These were tested on a 1.27 kind cluster by enabling, disabling, and re-enabling the feature-gate on the kubelet.
Use a basic config.yaml:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
- Create kind cluster:
kind create cluster --name kubelet-tracing --image kindest/node:v1.27.0 --config config.yaml
- exec into kind "node" container:
docker exec -it <container id for node> sh
- use apt-get to install vim
- Edit the kubelet configuration to enable the
KubeletTracing
feature gate:vim /var/lib/kubelet/config.yaml
- Restart the kubelet:
systemctl restart kubelet
. - Verify that the node status is still being updated (not from within docker container):
kubectl describe no
- Repeat steps 2-6, but disable
KubeletTracing
, and verify that the kubelet works after the restart. - Repeat steps 2-6, but re-enable
KubeletTracing
, and verify that the kubelet works after the restart.
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No
Operators are expected to have access to and/or control of the OpenTelemetry agent deployment and trace storage backend. KubeletConfiguration will show the FeatureGate and TracingConfiguration.
Workloads do not directly use this feature.
The tracing backend will display the traces with a service "kubelet".
99% expected spans are collected from kubelet No indications in logs of failing to export spans
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
None. Operators can use the absence of traces which an observability signal in their own right.
It would be helpful to have metrics about span generation and export: opentelemetry-go issue #2547
There is progress on defining and implementing OpenTelemetry trace SDK self-observability metrics:
- Proposal for names: open-telemetry/semantic-conventions#1631
- Prototype for OpenTelemetry-Go: open-telemetry/opentelemetry-go#6153
Yes. In the current version of the proposal, users must run the OpenTelemetry Collector as a daemonset and configure a backend trace visualization tool (jaeger, zipkin, etc). There are also a wide variety of vendors and cloud providers which support OTLP.
This will not add any additional API calls.
This will introduce an API type for the configuration. This is only for loading configuration, users cannot create these objects.
Not directly. Cloud providers could choose to send traces to their managed trace backends, but this requires them to set up a telemetry pipeline as described above.
No.
Will enabling / using this feature result in increasing time taken by any operations covered by [existing SLIs/SLOs]?
It will increase kubelet request latency by a negligible amount (<1 microsecond) for encoding and decoding the trace contex from headers, and recording spans in memory. Exporting spans is not in the critical path.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
The tracing client library has a small, in-memory cache for outgoing spans.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No.
The Troubleshooting section currently serves the Playbook
role. We may consider
splitting it into a dedicated Playbook
document (potentially with some monitoring
details). For now, we leave it here.
No reaction specific to this feature if API server and/or etcd is unavailable.
- [The kubelet is misconfigured and cannot talk to the collector or the kubelet cannot send traces to the backend]
- Detection: How can it be detected via metrics? Stated another way: how can an operator troubleshoot without logging into a master or worker node? kubelet logs, component logs, collector logs
- Mitigations: Fix the kubelet configuration, update collector, backend configuration
- Diagnostics: What are the useful log messages and their required logging levels that could help debug the issue? go-opentelemetry sdk provides logs indicating failure
- Testing: It isn't particularly useful to test misconfigurations.
If you believe tracing is causing issues (e.g. increasing memory consumption), try disabling it by not providing a configuration file. It can also be helpful to collect CPU or memory profiles if you think tracing might be to blame for higher resource usage.
- 2021-07-20: KEP opened
- 2022-07-22: KEP merged, targeted at Alpha in 1.24
- 2022-03-29: KEP deemed not ready for Alpha in 1.24
- 2022-06-09: KEP targeted at Alpha in 1.25
- 2023-01-09: KEP targeted at Beta in 1.27
- 2025-02-07: KEP targeted at Stable in 1.33
Small overhead of increased kubelet request latency, will be monitored during experimental phase.
This KEP suggests that we utilize the OpenTelemetry exporter format in all components. Alternative options include:
- Add configuration for many exporters in-tree by vendoring multiple "supported" exporters. These exporters are the only compatible backends for tracing in kubernetes. a. This places the kubernetes community in the position of curating supported tracing backends
- Support both a curated set of in-tree exporters, and the collector exporter