-
Notifications
You must be signed in to change notification settings - Fork 655
"startPattern" is fragile and wrong on newer kernels #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not sure with this. At least, I usually see kern.log contains multiple reboots. :) In Kubernetes cluster, we use logrotate to rotate all logs under /var/log. Is it smart enough to ignore kern.log? Anyhow, if each reboot, |
Even if it's been copy-trunc rotated, it still only applies to the current boot, so it doesn't matter. But that salt template is not applied to kern.log, only this list of files: https://github.com/kubernetes/kubernetes/blob/8fd414537b5143ab039cb910590237cabf4af783/cluster/saltbase/salt/logrotate/init.sls#L5 GCI however doesn't use that salt stuff and has its own logrotate chunk which does catch kern.log here: https://github.com/kubernetes/kubernetes/blob/dfe801de1021a005a7c43742c9357ef05dac0f0d/cluster/gce/gci/configure-helper.sh#L113-L136 I don't see why that changes anything though so long as it's true that rsyslog is also doing the typical rotation for kern.log. |
Yeah, that's right. I missed that.
I'm still not sure with this. I just rebooted one of my VMs, here is the kernel log: https://gist.github.com/Random-Liu/e77348945d4482c5f3fda034a3da7f90 |
Ah, well, that's unfortunate. I thought that was the default config on ubuntu and debian, but it looks like I'm wrong. That being said, ratherthan using a Perhaps the better heuristic here is to notice the "time since boot" at the beginning going backwards. |
When the NPD is introduced, we are still running on ContainerVM. At that time, we only have kern.log, and it's hard to a identify a reboot without defining a Initially I was using the kernel timestamp Let me see whether I could find some relation between boot id and kern.log. I think using boot id is the best solution, and journald is so convenient. Haha |
* Remove `unregister_netdevice` rule to fix kubernetes#47. * Change `KernelPanic` to `KernelOops` because we can't handle kernel panic currently. * Use system boot time instead of "StartPattern" to fix kubernetes#48.
* Remove `unregister_netdevice` rule to fix kubernetes#47. * Change `KernelPanic` to `KernelOops` because we can't handle kernel panic currently. * Use system boot time instead of "StartPattern" to fix kubernetes#48.
* Remove `unregister_netdevice` rule to fix kubernetes#47. * Change `KernelPanic` to `KernelOops` because we can't handle kernel panic currently. * Use system boot time instead of "StartPattern" to fix kubernetes#48.
* Remove `unregister_netdevice` rule to fix kubernetes#47. * Change `KernelPanic` to `KernelOops` because we can't handle kernel panic currently. * Use system boot time instead of "StartPattern" to fix kubernetes#48.
* Remove `unregister_netdevice` rule to fix kubernetes#47. * Change `KernelPanic` to `KernelOops` because we can't handle kernel panic currently. * Use system boot time instead of "StartPattern" to fix kubernetes#48.
* Change `unregister_netdevice` to be an event to fix kubernetes#47. * Change `KernelPanic` to `KernelOops` because we can't handle kernel panic currently. * Use system boot time instead of "StartPattern" to fix kubernetes#48.
* Change `unregister_netdevice` to be an event to fix kubernetes#47. * Change `KernelPanic` to `KernelOops` because we can't handle kernel panic currently. * Use system boot time instead of "StartPattern" to fix kubernetes#48.
Uh oh!
There was an error while loading. Please reload this page.
Broken out from here
Currently, the config has a default of
"startPattern": "Initializing cgroup subsys cpuset",
This pattern is meant to detect a node's boot process. Prior to the 4.5 kernel, this message was typically printed during boot of a node. After 4.5 however, due to this change, it is quite unlikely for that message to appear.
Furthermore, there's rarely a reason to detect whether a message is for the current boot in such a fragile way.
With the
kern.log
reader, every message is for the current boot because kern.log is usually handled where eachkern.log
file corresponds to one boot (e.g.kern.log
is this boot,kern.log.1
is the boot before,kern.log.2.gz
the one before, etc). (EDIT: I'm wrong about this for gci at least)With journald, the boot id is annotated in messages, and so it can accurately be correlated with the current boot id (see the "_BOOT_ID" record in journald messages).
With a kmsg reader, all messages will only be the current boot because kmsg is not persistent.
In none of those cases is
startPattern
useful. Each kernel log parsing plugin should be responsible for doing the right thing itself I think.The text was updated successfully, but these errors were encountered: