-
Notifications
You must be signed in to change notification settings - Fork 651
Reason change once and it stay in wrong state with custom plugins #202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@andyxning could you check this too? Thanks |
Thanks for the detailed re-produce procedures. Seems some logic is wrong in updating the condition. Will check this later today. |
Thank you! @andyxning maybe you know when you can merge back this PR? In addition whe you plan to release a new version? |
@aga20 Too busy these days. Will do this later this week. After the PR is merged, i will ping @Random-Liu to do a new release as we have some important bug fixes after the latest release. |
@andyxning |
@aga20 Will take a look at this later. |
Ive been running in to the same issue. Any progress with this? This makes integrating NPD into external tools more difficult |
I will update this later today. Sorry for the late. |
@aga20 Sorry for taking so long to update this. Too busy. I have update the PR in #203 to address the comments @Random-Liu made. Could you please give it a try and add a feedback here? |
Hello, When you plann to merge this back and release a new version? |
@andyxning seems like there is still an issue when status transitions from False -> Unknown and back. When status = False shouldn't reason always be defaultConditionReason, meaning normal state? Current implementation will set faulty reason to the condition with OK result, see change 3 and change 4.
Status is False, message is ok, there is no problem at this moment. But reason shows NTPIsDown.
|
have same issue on 0.6.3
test_2.sh looks like that
The idea is to touch /tmp/kube.fail to test condition monitoring.
Custom problem has proper Status and message says file was not found but reason is FailureDetected. |
@andyxning and @Random-Liu, can you take a look? What @MaksymTrykur described in #202 (comment) seems correct. What I don't understand is that for @AlexShemeshWix's example, it should never enter |
I found the bug. Will put up a fix PR shortly. |
The fix is now included in NPD v0.6.6. @AlexShemeshWix and @MaksymTrykur, can you double check if your problem is fixed? |
Fixed for me in v0.6.6:
vs before
|
Hello,
I found something again I think but maybe I follwed wrong way.
I'm using a custom config based on the ntp example:
The /usr/bin/custom.sh script is very simple: exit with 0 or 1.
So when the node problem detector start it set the condition:
After it run the script (what returned with 0 in this case) the Status stay false but the Reason field changed to what I set in the rule section:
So ok, in the next run the script exited with 1. The Status is True, and the Reason still same (this is what I set under the rule):
In the next round the script returned with 0 again and Status changed back to false but the Reason didn't change:
As I see you overwrite the condition's rule and maybe the original condition lost and the node problem detector never can't set it again. https://github.com/kubernetes/node-problem-detector/blob/master/pkg/custompluginmonitor/custom_plugin_monitor.go#L140
But maybe I missed something.
Thank you!
The text was updated successfully, but these errors were encountered: