Skip to content

Commit 3deaece

Browse files
authored
Merge pull request #3979 from aravindhp/2258-update
2258: Update node log query
2 parents 46c0656 + a47bacb commit 3deaece

File tree

2 files changed

+66
-135
lines changed

2 files changed

+66
-135
lines changed

keps/sig-windows/2258-node-service-log-viewer/README.md renamed to keps/sig-windows/2258-node-log-query/README.md

+63-132
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# KEP-2258: Node service log viewer
1+
# KEP-2258: Node log query
22

33
<!-- toc -->
44
- [Release Signoff Checklist](#release-signoff-checklist)
@@ -7,8 +7,8 @@
77
- [Goals](#goals)
88
- [Non-Goals](#non-goals)
99
- [Proposal](#proposal)
10-
- [Implement client for logs endpoint viewer (OS agnostic)](#implement-client-for-logs-endpoint-viewer-os-agnostic)
11-
- [Linux distros with systemd / journald](#linux-distros-with-systemd--journald)
10+
- [Implement client for logs endpoint (OS agnostic)](#implement-client-for-logs-endpoint-os-agnostic)
11+
- [Linux distributions with systemd / journald](#linux-distributions-with-systemd--journald)
1212
- [Linux distributions without systemd / journald](#linux-distributions-without-systemd--journald)
1313
- [Windows](#windows)
1414
- [User Stories](#user-stories)
@@ -17,7 +17,6 @@
1717
- [Wider access to all node level service logs](#wider-access-to-all-node-level-service-logs)
1818
- [Design Details](#design-details)
1919
- [kubelet](#kubelet)
20-
- [kubectl](#kubectl)
2120
- [Test Plan](#test-plan)
2221
- [Prerequisite testing updates](#prerequisite-testing-updates)
2322
- [Unit tests](#unit-tests)
@@ -45,15 +44,15 @@
4544
Items marked with (R) are required *prior to targeting to a milestone / release*.
4645

4746
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
48-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
47+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
4948
- [x] (R) Design details are appropriately documented
5049
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
5150
- [x] (R) Graduation criteria is in place
5251
- [x] (R) Production readiness review completed
53-
- [ ] (R) Production readiness review approved
52+
- [x] (R) Production readiness review approved
5453
- [x] "Implementation History" section is up-to-date for milestone
55-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
56-
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
54+
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
55+
- [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
5756

5857
[kubernetes.io]: https://kubernetes.io/
5958
[kubernetes/enhancements]: https://git.k8s.io/enhancements
@@ -65,10 +64,10 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
6564
A Kubernetes cluster administrator has to log in to the relavant control-plane
6665
or worker nodes to view the logs of the API server, kubelet etc. Or they would
6766
have to implement a client side reader. A simpler and more elegant method would
68-
be to allow them to use the kubectl CLI to also view these logs similar to
69-
using it for other interactions with the cluster. Given the sensitive nature of
70-
the information in node logs, this feature will only be available to cluster
71-
administrators.
67+
be to allow them to use a kubelet API or kubectl plugin to also view these logs
68+
similar to using it for other interactions with the cluster. Given the sensitive
69+
nature of the information in node logs, this feature will only be available to
70+
cluster administrators.
7271

7372
## Motivation
7473

@@ -77,19 +76,19 @@ a cluster administrator to SSH into the nodes for debugging. While certain
7776
issues will require being on the node, issues with the kube-proxy or kubelet,
7877
to name a couple, could be solved by perusing their logs. However this
7978
too requires the administrator to SSH access into the nodes. Having a way for
80-
them to view the logs using kubectl will significantly simplify their
81-
troubleshooting.
79+
them to view the logs using a kubelet API or kubectl plugin will significantly
80+
simplify their troubleshooting.
8281

8382

8483
### Goals
85-
Provide a cluster administrator with a streaming view of logs using kubectl
86-
without them having to implement a client side reader or logging into the node.
87-
This would work for:
84+
Provide a cluster administrator with a streaming view of logs using a kubelet
85+
API without them having to implement a client side reader or logging into the
86+
node. This would work for:
8887
- Services on Linux worker and control plane nodes:
8988
- That have systemd / journald support.
9089
- That have services that log to `/var/log/`
9190
- Windows worker nodes (all supported variants) that log to `C:\var\log`,
92-
System and Application logs, Windows Event Logs and Event Tracing (ETW).
91+
and Application logs.
9392

9493
### Non-Goals
9594
- Providing support for non-systemd Linux distributions.
@@ -99,14 +98,12 @@ This would work for:
9998

10099
## Proposal
101100

102-
### Implement client for logs endpoint viewer (OS agnostic)
103-
- Implement a new `kubectl node-logs` to work with node objects.
104-
- Implement a client for the `/var/log/` kubelet endpoint viewer.
101+
### Implement client for logs endpoint (OS agnostic)
102+
- Implement a client for the `/proxy/logs/` kubelet endpoint viewer.
105103

106-
### Linux distros with systemd / journald
107-
Supplement the the `/var/log/` endpoint viewer on the kubelet with a thin shim
108-
over the `journal` directory that shells out to journalctl. Then implement
109-
`kubectl node-logs` to also work with node objects.
104+
### Linux distributions with systemd / journald
105+
Supplement the the `/proxy/logs/` endpoint viewer on the kubelet with a thin shim
106+
over the `journal` directory that shells out to journalctl.
110107

111108
### Linux distributions without systemd / journald
112109
Running the new "kubectl node-logs" command against services on nodes that do
@@ -122,33 +119,30 @@ Reuse the kubelet API for querying the Linux journal for invoking the
122119
Consider a scenario where pods / containers are refusing to come up on certain
123120
nodes. As mentioned in the motivation section, troubleshooting this scenario
124121
involves the cluster administrator to SSH into nodes to scan the logs. Allowing
125-
them to use `kubectl node-logs` to do the same as they would to debug issues with a
122+
them to use the kubelet API to do the same as they would to debug issues with a
126123
pod / container would greatly simply their debug workflow. This also opens up
127124
opportunities for tooling and simplifying automated log gathering. The feature
128125
can also be used to debug issues with Kubernetes services especially in Windows
129126
nodes that run as native Windows services and not as DaemonSets or Deployments.
130127

131128
Here are some example of how a cluster administrator would use this feature:
132129
```
133-
# Show kubelet and crio journal logs from all masters
134-
kubectl node-logs --role master -q kubelet -q crio
130+
# Fetch kubelet logs from a node named node-1.example
131+
kubectl get --raw "/api/v1/nodes/node-1.example/proxy/logs/?query=kubelet"
135132
136-
# Show kubelet log file (/var/log/kubelet/kubelet.log) from all Windows worker nodes
137-
kubectl node-logs --label kubernetes.io/os=windows -q kubelet
133+
# Fetch kubelet logs from a node named node-1.example that have the word "error"
134+
kubectl get --raw "/api/v1/nodes/node-1.example/proxy/logs/?query=kubelet&pattern=error"
138135
139-
# Display docker runtime WinEvent log entries from a specific Windows worker node
140-
kubectl node-logs <node-name> --query docker
136+
# Display foo.log from a node name node-1.example
137+
kubectl get --raw "/api/v1/nodes/node-1.example/proxy/logs/?query=/foo.log
141138
```
142139

143140
### Risks and Mitigations
144141

145142
#### Large log files and events
146143
If the log that is attempted to be viewed is very large (GBs) there is
147-
potential for the node performance to be degraded. To mitigate this we can
148-
document that node logs should always be rotated in clusters that enable this
149-
feature. We should also take into account nodes that don't take advantage of
150-
journald's rate limiting options. We can then take real world feedback around
151-
this for better mitigation when graduating the feature from alpha to beta.
144+
potential for the node performance to be degraded. To mitigate this we only
145+
allow returning of messages that can be retrieved within 30 seconds.
152146

153147
#### Wider access to all node level service logs
154148
The cluster administrator can now view all logs in /var/log/, systemd/journald
@@ -164,24 +158,33 @@ usage feedback.
164158

165159
The kubelet already has a `/var/log/` [endpoint viewer](https://github.com/kubernetes/kubernetes/blob/b184272e278571d1e6650605dd4c39be897eaaa2/pkg/kubelet/kubelet.go#L1403)
166160
that is lacking a client. Given its existence we can supplement that with a
167-
wafer thin shim over the /journal directory that shells out to journalctl. This
168-
allows us to extend the endpoint for getting logs from the system journal on
169-
Linux systems that support systemd. To enable filtering of logs, we can reuse
170-
the existing filters supported by journalctl. The `kubectl node-logs` will have
171-
command line options for specifying these filters when interacting with node
172-
objects.
161+
wafer thin shim that shells out to journalctl. This allows us to extend the
162+
endpoint for getting logs from the system journal on Linux systems that support
163+
systemd. To enable filtering of logs, we can reuse the existing filters
164+
supported by journalctl.
173165

174166
On the Windows side viewing of logs from services that use `C:\var\log` will
175167
be supported by the existing endpoint. For Windows services that log to the
176-
the System and Application logs, Windows Event Logs and Event Tracing (ETW),
177-
we can leverage the [Get-WinEvent cmdlet](https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.diagnostics/get-winevent?view=powershell-7.1)
168+
the Application logs,we can leverage the
169+
[Get-WinEvent cmdlet](https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.diagnostics/get-winevent?view=powershell-7.1)
178170
that supports getting logs from all these sources. The cmdlet has filtering
179171
options that can be leveraged to filter the logs in the same manner we do
180172
with the journal logs.
181173

182174
Please note that filtering will not be available for logs in `/var/log/` or
183175
`C:\var\log\`.
184176

177+
The complete list of options that can be used are:
178+
179+
Option | Description
180+
------ | -----------
181+
`boot` | boot show messages from a specific system boot
182+
`pattern` | pattern filters log entries by the provided PERL-compatible regular expression
183+
`query` | query specifies services(s) or files from which to return logs (required)
184+
`sinceTime` | an [RFC3339](https://www.rfc-editor.org/rfc/rfc3339) timestamp from which to show logs (inclusive)
185+
`untilTime` | an [RFC3339](https://www.rfc-editor.org/rfc/rfc3339) timestamp until which to show logs (inclusive)
186+
`tailLines` | specify how many lines from the end of the log to retrieve; the default is to fetch the whole log
187+
185188
The feature now enables the cluster administrator to interrogate all services.
186189
This could be prevented by having a whitelist of allowed services. But this
187190
comes with severe disadvantages as there could be nodes (especially with
@@ -193,75 +196,12 @@ configured. Here are some examples:
193196

194197

195198
The `/var/log/` endpoint is enabled using the `enableSystemLogHandler` kubelet
196-
configuration options. To gain access to this new feature this option needs to
197-
be enabled. In addition when introducing this feature it will be hidden behind a
198-
`NodeLogQuery` feature gate in the kubelet that needs to be explicitly enabled. So
199-
you need to enable both options to get access to this new feature and disabling
200-
`enableSystemLogHandler` will disable the new feature irrespective of the
201-
`NodeLogQuery` feature gate.
202-
203-
A reference implementation of this feature is available
204-
[here](https://github.com/kubernetes/kubernetes/pull/96120).
205-
206-
#### kubectl
207-
208-
`kubectl` has an existing `logs` command that is used to view the logs for a
209-
container in a pod or a specified resource. The sub-command looks at resource
210-
types, so can be extended to work with node objects to view the logs of services
211-
on the nodes. Given that the `logs` command depends on RBAC policies for access
212-
to appropriate resource type and associated endpoints, it will allow us to
213-
restrict node logs access to only cluster administrators as long as the cluster
214-
is setup in that manner. Access to the `node/logs` sub-resource needs to be
215-
explicitly granted as a user with access to `nodes` will not automatically have
216-
access to `node/logs`. In the alpha phase the functionality will be behind
217-
`kubectl alpha node-logs` sub-command. The functionality will be moved to
218-
`kubectl node-logs` in the beta phase. However the examples will reference the
219-
final destination i.e. `kubectl node-logs`.
220-
221-
The `logs --query` sub-command for node objects will follow a heuristics approach when
222-
asked to query for logs from a Windows or Linux service. If asked to get the
223-
logs from a service `foobar`, it will first assume `foobar` logs to the Linux
224-
journal / Windows eventing mechanisms (Application, System, and ETW). If unable
225-
to get logs from these, it will attempt to get logs from `/var/log/foobar.log`,
226-
`/var/log/foobar/foobar.log`, `/var/log/foobar*INFO` or
227-
`/var/log/foobar/foobar*INFO` in that order. Alternatively an explicit file
228-
location can be passed to the `--query` option.
229-
Here are some examples and explanation of the options that will be added.
230-
```
231-
Examples:
232-
# Show kubelet logs from all masters
233-
kubectl node-logs --role master -q kubelet
234-
235-
# Show docker logs from Windows nodes
236-
kubectl node-logs -l kubernetes.io/os=windows -q docker
237-
238-
# Show foo.log from Windows nodes
239-
kubectl node-logs -l kubernetes.io/os=windows -q /foo/foo.log
240-
241-
Options:
242-
-g, --grep='': Filter log entries by the provided regex pattern. Only applies to node journal logs.
243-
--raw=false: Perform no transformation of the returned data.
244-
--role='': Set a label selector by node role.
245-
-l, --selector='': Selector (label query) to filter on.
246-
--since-time='': Return logs after a specific ISO timestamp.
247-
--tail=-1: Return up to this many lines (not more than 100k) from the end of the log.
248-
--sort=timestamp: Interleave logs by sorting the output. Defaults on when viewing node journal logs.
249-
-q, --query=[]: Return log entries that matches any of the specified service(s).
250-
--until-time='': Return logs before a specific ISO timestamp.
251-
```
252-
253-
The `--sort=timestamp` feature will introduce log unification across node
254-
objects by timestamps which can be extended to pod logs. This will allow users
255-
to see logs across nodes from the same time. Similarly for pods, it will allow
256-
seeing logs across containers aligned by time.
257-
258-
Given that the feature will be introduced behind a feature gate, by default
259-
`kubectl node-logs` will return a functionality not available message. When the
260-
feature is enabled in alpha phase, `kubectl node-logs` will display a
261-
warning message that the feature is in alpha. When the `--query` option
262-
is used against Linux nodes that do not support systemd/journald and the service
263-
does not log to `/var/log`, the same functionality not available message will be
264-
returned.
199+
configuration options. To gain access to this new feature, this option and a
200+
newly introduced `enableSystemLogQuery` needs to be enabled. In addition when
201+
introducing this feature it will be hidden behind a `NodeLogQuery` feature gate
202+
in the kubelet that needs to be explicitly enabled. So you need to enable both
203+
options to get access to this new feature. Disabling `enableSystemLogQuery`
204+
will disable the new feature irrespective of the `NodeLogQuery` feature gate.
265205

266206
### Test Plan
267207

@@ -274,8 +214,7 @@ to implement this enhancement.
274214
##### Unit tests
275215

276216
Add unit tests to kubelet and kubectl that exercise the new arguments that
277-
have been added. A reference implementation of the tests can be seen
278-
[here](https://github.com/kubernetes/kubernetes/pull/96120/commits/253dbad91a3896680da74da32595f02120f56cfa#diff-1d703a87c6d6156adf2d0785ec0174bb365855d4883f5758c05fda1fee8f7f1b)
217+
have been added.
279218

280219
Given that a new kubelet package is introduced as part of this feature there is
281220
no existing test coverage to link to.
@@ -307,29 +246,23 @@ sub-command.
307246

308247
#### Alpha -> Beta Graduation
309248

310-
The plan is to graduate the feature to beta in the v1.28 time frame. At that
249+
The plan is to graduate the feature to beta in the v1.29 time frame. At that
311250
point we would have collected feedback from cluster administrators and
312-
developers who have enabled the feature. Based on this feedback and issues
313-
opened we should consider adding a kubelet side throttle for the viewing the
314-
logs. In addition we will garner feedback on the heuristic approach and based on
315-
that we will decide if we need introduce options to explicitly differentiate
316-
between file vs journal / WinEvent logs.
317-
318-
The kubectl implementation will move from `kubectl alpha node-logs` to
319-
`kubectl node-logs`.
251+
developers who have enabled the feature. In addition we will provide a kubectl
252+
plugin for querying the logs more elegantly instead of using raw API calls.
253+
320254
#### Beta -> GA Graduation
321255

322-
The plan is to graduate the feature to GA in the v1.29 time frame at which point
256+
The plan is to graduate the feature to GA in the v1.30 time frame at which point
323257
any major issues should have been surfaced and addressed during the alpha and
324258
beta phases.
325259

326260
### Upgrade / Downgrade Strategy
327261

328262
### Version Skew Strategy
329263

330-
If a kubectl version that has the new `node-logs` option is used against a node
331-
that is using a kubelet that does not have the extended `/var/log` endpoint
332-
viewer, the result should be "feature not supported".
264+
If the API call is made against a kubelet that does not support the new feature,
265+
a 404 will be returned.
333266

334267
## Production Readiness Review Questionnaire
335268

@@ -408,6 +341,7 @@ resource usage (CPU, RAM, disk, IO, ...) in any components?**
408341
- Created on Jan 14, 2021
409342
- Updated on May 5th, 2021
410343
- Updated on Dec 13th, 2022
344+
- Updated on May 2nd, 2023
411345

412346
## Drawbacks
413347

@@ -417,6 +351,3 @@ Alternatively we could use a client side reader on the nodes to redirect the
417351
logs. The Windows side would require privileged container support. However this
418352
would not help scenarios where containers are not launching successfully on the
419353
nodes.
420-
421-
For the kubectl changes an alternative to introducing `kubectl node-logs` would be to
422-
introduce a plugin.

keps/sig-windows/2258-node-service-log-viewer/kep.yaml renamed to keps/sig-windows/2258-node-log-query/kep.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
title: Node service log viewer
1+
title: Node log query
22
kep-number: 2258
33
authors:
44
- "@aravindhp"
@@ -11,12 +11,12 @@ participating-sigs:
1111
status: implementable
1212
reviewers:
1313
- "@marosset"
14-
- "@immuzz"
14+
- "@liggit"
1515
- "@thockin"
1616
approvers:
1717
- "@marosset"
1818
creation-date: 2021-01-14
19-
last-updated: 2022-06-06
19+
last-updated: 2023-05-02
2020
# The target maturity stage in the current dev cycle for this KEP.
2121
stage: alpha
2222

0 commit comments

Comments
 (0)