Skip to content

Exceptions in Callback Handlers can lead to deadlock #558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
btasker opened this issue Feb 9, 2023 · 1 comment · Fixed by #559
Closed

Exceptions in Callback Handlers can lead to deadlock #558

btasker opened this issue Feb 9, 2023 · 1 comment · Fixed by #559
Labels
bug Something isn't working
Milestone

Comments

@btasker
Copy link
Contributor

btasker commented Feb 9, 2023

Specifications

  • Client Version: 1.3.6
  • InfluxDB Version: 1.8.10 / 2.x
  • Platform: Any

Code sample to reproduce problem

Attaching repro script

repro_lockup.py.gz

Expected behavior

Script should exit once processing in main() is complete, even if the callback handlers fail (for whatever reason)

Actual behavior

Script runs until killed.

Strace shows processing is stuck in sleep

clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {tv_sec=1438366, tv_nsec=978518494}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=1438366, tv_nsec=979255844}) = 0
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {tv_sec=1438367, tv_nsec=79255844}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=1438367, tv_nsec=79775567}) = 0
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {tv_sec=1438367, tv_nsec=179775567}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=1438367, tv_nsec=180567114}) = 0

Looks to be this while loop

Additional info

No response

@btasker btasker added the bug Something isn't working label Feb 9, 2023
@btasker
Copy link
Contributor Author

btasker commented Feb 9, 2023

If a configured callback throws an exception, the thread used to handle batching writes dies.

As a result, the conditional used in that while will never evaluate true, leaving the script stuck in an infinite loop.

There are probably 2 changes needed for this

  • Update the callback invocations (here) to trap exceptions and warn that they happened
  • Add a counter or timer to the while loop so that if the thread takes too long to close, the script can still exit

I'm going to try and get a PR in to implement both shortly

btasker added a commit to btasker/influxdb-client-python that referenced this issue Feb 9, 2023
This is one part of a fix for influxdata#558

It:

* Adds a WriteOption `max_close_wait` (default 500,000ms)
* Adjust `__del__` so that we'll only wait `max_close_wait`ms for queued writes to complete
* Adds a warning if the threshold is hit
btasker added a commit to btasker/influxdb-client-python that referenced this issue Feb 9, 2023
The other part of the fix for influxdata#558

If a configured callback results in an exception we'll trap it and log
the details rather than allowing it to interrupt the parent thread
@bednar bednar added this to the 1.35.0 milestone Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants