Skip to content

Proposal: Move logMonitor message capture into rules[].pattern #1038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
skaven81 opened this issue Mar 11, 2025 · 0 comments
Open

Proposal: Move logMonitor message capture into rules[].pattern #1038

skaven81 opened this issue Mar 11, 2025 · 0 comments

Comments

@skaven81
Copy link

The current NPD architecture in the logMonitor is that the pluginConfig.message regex is used to capture a string which is included in the node condition or Event if a fault is detected. This feature is very useful as it allows for node conditions and Events to include specific, actionable information from the log event that indicated a problem, avoiding the need for an administrator or automation system to have to dig through log messages to find out what exactly went wrong.

However, as demonstrated in the sample configuration https://github.com/kubernetes/node-problem-detector/blob/master/config/disk-log-message-filelog.json, because pluginConfig.message must match ALL the possible conditions that may be matched by rules[].pattern, the regex can quickly grow long and complex, and ultimately ends up duplicating the work of the rules[].pattern regexes.

It would be much more convenient if pluginConfig.message regex was only used for initial filtering, and not for message capture. Instead, have the message extracted from the rules[].pattern regex. That way, the regexes for each rule serve a more targeted purpose, with the data extracted for that specific detected condition/event configured right there in the rule. Then pluginConfig.message becomes a higher level filter, which could, at its simplest, be empty, meaning "send all log events through for rules[] evaluation". But it would also be useful then to leverage pluginConfig.message for additional purposes:

  • Acting as an initial, broad filter that ensures only a subset of log events make it through to be evaluated by rules[].pattern. For a high-volume log stream this might be important for resource optimization by avoiding having to evaluate every log event against multiple rules[].pattern regexes, if it doesn't match the initial pluginConfig.message regex.
  • Allowing for simpler rules[].pattern regexes, because they only have to be written to match against messages that have already passed through the pluginConfig.message regex. This can make problem detection more reliable by more carefully controlling the shape and structure of input messages that are passed through to rule patterns.

I propose that the logMonitor be updated in the following ways:

  1. Preserve pluginConfig.message with its current behavior, to keep backward compatibility. If defined in a logMonitor JSON config, then pluginConfig.prefilter is ignored, and the current behavior of extracting the message from the top-level filter is preserved.
  2. Add a new pluginConfig.prefilter regex whose only purpose is to prefilter the log stream before it is evaluated by rules[].pattern. If this is defined (or neither pluginConfig.message nor pluginConfig.prefilter are defined) then a node condition or event message is extracted from the matching rules[].pattern regex, not pluginConfig.message.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant