Proposal: Move logMonitor message capture into rules[].pattern #1038

skaven81 · 2025-03-11T16:38:21Z

The current NPD architecture in the logMonitor is that the pluginConfig.message regex is used to capture a string which is included in the node condition or Event if a fault is detected. This feature is very useful as it allows for node conditions and Events to include specific, actionable information from the log event that indicated a problem, avoiding the need for an administrator or automation system to have to dig through log messages to find out what exactly went wrong.

However, as demonstrated in the sample configuration https://github.com/kubernetes/node-problem-detector/blob/master/config/disk-log-message-filelog.json, because pluginConfig.message must match ALL the possible conditions that may be matched by rules[].pattern, the regex can quickly grow long and complex, and ultimately ends up duplicating the work of the rules[].pattern regexes.

It would be much more convenient if pluginConfig.message regex was only used for initial filtering, and not for message capture. Instead, have the message extracted from the rules[].pattern regex. That way, the regexes for each rule serve a more targeted purpose, with the data extracted for that specific detected condition/event configured right there in the rule. Then pluginConfig.message becomes a higher level filter, which could, at its simplest, be empty, meaning "send all log events through for rules[] evaluation". But it would also be useful then to leverage pluginConfig.message for additional purposes:

Acting as an initial, broad filter that ensures only a subset of log events make it through to be evaluated by rules[].pattern. For a high-volume log stream this might be important for resource optimization by avoiding having to evaluate every log event against multiple rules[].pattern regexes, if it doesn't match the initial pluginConfig.message regex.
Allowing for simpler rules[].pattern regexes, because they only have to be written to match against messages that have already passed through the pluginConfig.message regex. This can make problem detection more reliable by more carefully controlling the shape and structure of input messages that are passed through to rule patterns.

I propose that the logMonitor be updated in the following ways:

Preserve pluginConfig.message with its current behavior, to keep backward compatibility. If defined in a logMonitor JSON config, then pluginConfig.prefilter is ignored, and the current behavior of extracting the message from the top-level filter is preserved.
Add a new pluginConfig.prefilter regex whose only purpose is to prefilter the log stream before it is evaluated by rules[].pattern. If this is defined (or neither pluginConfig.message nor pluginConfig.prefilter are defined) then a node condition or event message is extracted from the matching rules[].pattern regex, not pluginConfig.message.

The text was updated successfully, but these errors were encountered:

skaven81 mentioned this issue Mar 11, 2025

Eliminate logspam using filelog monitor #1032

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Move logMonitor message capture into rules[].pattern #1038

Proposal: Move logMonitor message capture into rules[].pattern #1038

skaven81 commented Mar 11, 2025

Proposal: Move logMonitor message capture into rules[].pattern #1038

Proposal: Move logMonitor message capture into rules[].pattern #1038

Comments

skaven81 commented Mar 11, 2025