Complex Custom Plugin Monitors in Node Problem Detector: Seeking Advice on Scalable Approaches #1016

AcidLeroy · 2025-01-23T19:01:09Z

Complex Custom Plugin Monitors in Node Problem Detector: Seeking Advice on Scalable Approaches

We are currently implementing complex custom-plugin-monitors for Node Problem Detector (NPD). As our monitoring scripts grow in complexity, we're encountering challenges that may require a reassessment of our current approach. We'd like to seek community advice on best practices for handling advanced monitoring scenarios within the NPD framework.

Current Implementation:

Scripts are written in Python
Scripts are loaded into a ConfigMap and mounted into the NPD pod for execution
Kustomize is used to modify NPD mounts for loading rules (similar to Load rule from configmap #893)

Challenges:

Increasing Complexity: Our scripts are becoming more sophisticated, requiring:
- Breaking them into modules for better organization
- Incorporating third-party libraries
- Sharing functions between different monitors
Maintainability Concerns: As complexity grows, managing these scripts within ConfigMaps is becoming cumbersome
Potential Scope Creep: We're questioning whether we're extending beyond NPD's intended use case

Proposed Solutions:

Maintain Current Approach: Continue using ConfigMaps, but improve our build and deployment process to handle modular scripts and dependencies
Separate Monitoring Pods: Deploy complex "detectors" as separate pods/daemonsets, communicating with NPD over a simple protocol
- Pros: Clear separation of concerns, easier management of dependencies
- Cons: Increased resource usage, potential communication overhead
NPD Plugin System: Explore the possibility of enhancing NPD with a more robust plugin system that can handle complex, modular monitors

Questions for the Community:

What are the recommended best practices for implementing complex custom monitors in NPD?
Are there existing solutions or patterns within the Kubernetes ecosystem for handling advanced node problem detection scenarios?

We appreciate any insights, recommendations, or alternative approaches the community can provide to help us determine the most effective and maintainable solution for our use case.

daveoy · 2025-02-04T19:41:48Z

a solution to your configMap maintenance toil might simply be to build a custom NPD container image with your dependencies pip installed and your scripts copied into the image itself?

AcidLeroy · 2025-02-11T19:58:52Z

@daveoy thanks for the response! I think building a custom image is the way to go! Thanks for your help!

AcidLeroy closed this as completed Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Complex Custom Plugin Monitors in Node Problem Detector: Seeking Advice on Scalable Approaches #1016

Complex Custom Plugin Monitors in Node Problem Detector: Seeking Advice on Scalable Approaches #1016

AcidLeroy commented Jan 23, 2025

daveoy commented Feb 4, 2025

Uh oh!

AcidLeroy commented Feb 11, 2025

Uh oh!

Complex Custom Plugin Monitors in Node Problem Detector: Seeking Advice on Scalable Approaches #1016

Complex Custom Plugin Monitors in Node Problem Detector: Seeking Advice on Scalable Approaches #1016

Comments

AcidLeroy commented Jan 23, 2025