Skip to content

Complex Custom Plugin Monitors in Node Problem Detector: Seeking Advice on Scalable Approaches #1016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AcidLeroy opened this issue Jan 23, 2025 · 2 comments

Comments

@AcidLeroy
Copy link

Complex Custom Plugin Monitors in Node Problem Detector: Seeking Advice on Scalable Approaches

We are currently implementing complex custom-plugin-monitors for Node Problem Detector (NPD). As our monitoring scripts grow in complexity, we're encountering challenges that may require a reassessment of our current approach. We'd like to seek community advice on best practices for handling advanced monitoring scenarios within the NPD framework.

Current Implementation:

  • Scripts are written in Python
  • Scripts are loaded into a ConfigMap and mounted into the NPD pod for execution
  • Kustomize is used to modify NPD mounts for loading rules (similar to Load rule from configmap #893)

Challenges:

  1. Increasing Complexity: Our scripts are becoming more sophisticated, requiring:

    • Breaking them into modules for better organization
    • Incorporating third-party libraries
    • Sharing functions between different monitors
  2. Maintainability Concerns: As complexity grows, managing these scripts within ConfigMaps is becoming cumbersome

  3. Potential Scope Creep: We're questioning whether we're extending beyond NPD's intended use case

Proposed Solutions:

  1. Maintain Current Approach: Continue using ConfigMaps, but improve our build and deployment process to handle modular scripts and dependencies

  2. Separate Monitoring Pods: Deploy complex "detectors" as separate pods/daemonsets, communicating with NPD over a simple protocol

    • Pros: Clear separation of concerns, easier management of dependencies
    • Cons: Increased resource usage, potential communication overhead
  3. NPD Plugin System: Explore the possibility of enhancing NPD with a more robust plugin system that can handle complex, modular monitors

Questions for the Community:

  1. What are the recommended best practices for implementing complex custom monitors in NPD?
  2. Are there existing solutions or patterns within the Kubernetes ecosystem for handling advanced node problem detection scenarios?

We appreciate any insights, recommendations, or alternative approaches the community can provide to help us determine the most effective and maintainable solution for our use case.

@daveoy
Copy link
Contributor

daveoy commented Feb 4, 2025

a solution to your configMap maintenance toil might simply be to build a custom NPD container image with your dependencies pip installed and your scripts copied into the image itself?

@AcidLeroy
Copy link
Author

@daveoy thanks for the response! I think building a custom image is the way to go! Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants