-
Notifications
You must be signed in to change notification settings - Fork 1.9k
No Issue #4338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the suggestion! I don't fully understand the use cases this is supposed to help with. Could you clarify them?
Why is this useful?
Which specific metrics do you want to capture and why?
In which cases is making a snapshot is too heavy, but adding extra code to each coroutine resumption isn't? With snapshots, you are only paying for coroutines that exist at the moment the snapshot is made, but with the instrumentation you propose, every single coroutine will have to pay the price all the time. With |
I'm very sorry if I'm wrong, but I get the impression your text is written by AI. If so, I must ask you to either stop using AI or close the issue, as discussing this problem with ChatGPT has no chance of leading to any useful insights. Let's assume that you didn't use AI, have an actual business need, and can explain it. The examples of metrics that you propose are still unclear to me. Why should the average suspension be an indicator of some issue? If you make many network requests, for example, the average suspension will be long. If you replace an inefficient spinlock on data with a suspension, the average suspension in your program will be longer, but the resource utilization will be improved. Likewise, average completion times will mostly depend on what kind of work coroutines do, not on how efficiently they do it. Cancellations are also not a sign of anything going wrong. Please provide a concrete example of which insights about your program you're hoping to gain from coroutine metrics.
This idea is especially suspicious. If your records are so robust that you can retrace the execution using them, it means that this logging is not fire-and-forget but something that must ensure the log entry actually gets stored, but this means the coroutine will have to stop all useful work to ensure robust logging. This will lead to a tremendous performance degradation.
None of the things you list seem like anomalies to me.
I/O slowdowns can be measured more directly by looking at the network/filesystem utilization using the operating system tools.
Ideally, timeouts should propagate to a global exception handler where, yes, it makes sense to write something to the log. It's a long-standing issue that our timeouts don't work like that: #1374. This is unrelated to the current discussion, though.
As I mentioned, these can indicate a change in how coroutines are used. End-to-end performance metrics seem like a better fit for this.
This will be reflected in end-to-end performance and in the CPU utilization, both of which are metrics that are easier to measure directly.
This makes a lot of sense for threads, but with coroutines, structured concurrency is heavily encouraged, and when used correctly, solves the issue of leaking computations by construction. |
-Issue Deleted-
The text was updated successfully, but these errors were encountered: