Feature/test health check #128
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Multiple improvements in the process of consuming kafka events to avoid broken consumers without any notification. In nutshell, it was issue because of unhandled rejection of promise and after
node 7.0.0
, it started to cause app to crash instead of just logging a warning. This app is using node8.2.1
so node was crashing the event loop execution immediately after the detection of unhandled rejection of a promise while some other apps (e.g.tc-email-service
) are on version6.x
which just logs warning but continues to execute the event loop or they just don't have the cases of unhandled rejection yet.Here are the things that we have improved to fix the situation and prevent it from occurring in future.
.catch
handler for the API call promise to have better control over what to be done when it fails to load the mentioned user data. It logs a message (which can be captured and alerted by log aggregator) whenever there is error in fetching details of mentioned user and skips notification for such user so that the process can continue sending notifications to other intended users.1
to handle the recent incident, we have improved the health check method to detect such state where Kafka consumer is not consuming any message further because of any unhandled rejection errors. It now uses thepause
field on the subscription objects of the consumer to detect such incidents. So, now it would cause health check to fail which would in turn result in restarting the process by the container as per configuration.unhandledRejection
event so that we can abort the process as soon as we get any unhandled rejection of promise for a single event loop of node js process which in turn should cause the container to restart the process.catch
handlers. If this promise chain was not broken at the time of incident, it would not have stuck in unresponsive state and would have continued consuming other events without any restart.