Skip to content

Coroutine is (currently) hard to debug #3198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
revintec opened this issue Feb 21, 2022 · 3 comments
Closed

Coroutine is (currently) hard to debug #3198

revintec opened this issue Feb 21, 2022 · 3 comments
Labels

Comments

@revintec
Copy link

revintec commented Feb 21, 2022

I'm making extensive use of coroutines, replacing thread calls and use structured concurrency as suggested. however I found coroutines very hard to debug. using org.jetbrains.kotlinx:kotlinx-coroutines-debug:1.6.0 along with -jdk8, -core both at 1.6.0, which is the latest version as of writing

  1. https://youtrack.jetbrains.com/issue/KTIJ-21056
  2. can we always turn on coroutine debug(if the overhead is minimal) for at least coroutine listing/stacktrace? a system that is not observable is almost non-debuggable
  3. provide a jvmti library for common OSes(linux/macos/...) that dumps a structured(with parent/child relationship) coroutine stacktrace when thread dump is requested. currently we have to use SIGTRAP but using SIGQUIT matches the convention. see https://github.com/openjdk/jdk8u/blob/8d5c7386c619a2602d9731c4adbbb1b01aeb449f/hotspot/src/share/vm/runtime/os.cpp#L320
  4. dumpCoroutinesInfo is not structured, no parent/child relation. and the printout doesn't end with line feed, which is an oversight maybe?
  5. there is Job.children but no Job.parent, is there a reason why? this makes constructing job graph very awkward
@qwwdfsad
Copy link
Collaborator

Hi,

can we always turn on coroutine debug(if the overhead is minimal) for at least coroutine listing/stacktrace
if the overhead is minimal

The "minimal" and "acceptable" overhead is a very application specific term, that depends on application SLA and other various factors, so I suggest you to figure it yourself whether it's acceptable. On other side, we've optimized debugging agent to some reasonable extent. The slowest part is collection of creation stacktraces, that can (probably even "should" for production environments) be completely disabled either programmatically or using a system property if you are running as -javaagent

provide a jvmti library

Could you please elaborate on why you need JVMTI library and why agent JAR is not enough?
Both maintaining and shipping native libraries in JVM is a real pain, so the reason to do so has to be significant and unachievable otherwise.

dumpCoroutinesInfo is not structured, no parent/child relation.

In typical systems there are from hundreds to thousands coroutines with arbitrary depths. Enabling parent-child relationship (e.g. by properly nesting stacktrace) will render it unusable, taking into account 10-100+ levels of nesting.

The better solution, IMO, is to provide a proper, stable and convenient customization points, so this is easily achievable manually if necessary

and the printout doesn't end with line feed, which is an oversight maybe?

I think so, not sure what was the original reason

there is Job.children but no Job.parent, is there a reason why? this makes constructing job graph very awkward

Because there wasn't any real demand on that before and adding this originally was a non-trivial task. For now it seems already here, just not exposed in public API. It would be nice if you could file a separate issue with a short explanation of your use-case, so we can fix it separately from debug agent

@revintec
Copy link
Author

@qwwdfsad thanks for the reply
JVMTI can monitor SIGQUIT(instead of SIGTRAP used by javaagent), thus a coroutine dump can be automatically produced when a thread dump is requested, lessen users' learning curve. If using coroutines, thread dump are less informative and a structured coroutine dump would be better. Currently we have to find java process ID(not that easy if you have multiple instance running with the same commandline arguments) and use another ssh session to send a SIGTRAP

Maintaining a native library is indeed a pain, and that is a reason why bundle it in the library makes it easier for the user. The JVMTI code however should be minimal, just set a flag that can be read inside coroutine debug's java library, and then java code produces a coroutine dump.

But you've made a point too, if this easy debug feature is not of much use, it could indeed burden the library. I'll be searching for other ways to monitor SIGQUIT in the meantime.

@qwwdfsad
Copy link
Collaborator

qwwdfsad commented Apr 8, 2024

I'm closing this as it breaks down into multiple issues -- some of them fixed, some of them are being worked on a separate basis (i.e. #3587), and JVMTI is unlikely to ever implemented by us

@qwwdfsad qwwdfsad closed this as completed Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants