Skip to content

feat: add status watcher to MCP server #18320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 13, 2025
Merged

feat: add status watcher to MCP server #18320

merged 15 commits into from
Jun 13, 2025

Conversation

code-asher
Copy link
Member

@code-asher code-asher commented Jun 11, 2025

This is meant to complement the existing task reporter since the LLM does not call it reliably.

I also did some refactoring to use the common agent flags/env vars, those changes are in separate commits.

For #18163, but will need to update the modules to make use of it.

@code-asher code-asher changed the title Add status watcher to MCP server feat: add status watcher to MCP server Jun 11, 2025
@code-asher code-asher force-pushed the asher/report-task branch 5 times, most recently from fd1b20f to f8d7628 Compare June 11, 2025 10:38
@code-asher code-asher requested a review from hugodutka June 11, 2025 10:59
Copy link
Contributor

@hugodutka hugodutka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea for getting rid of that "a message from the screen watcher and a message from the LLM could arrive out of order if the timing is just right" race condition:

  • maintain a state object that keeps track of
    • an array of pending updates with timestamps
    • the latest status reported to the server with a timestamp
  • instead of submitting status updates to a queue, push them to the array of pending updates
  • in a separate loop that ticks every second, process pending updates in batch. when processing a "screen stable" update, act on it only if it's at least 1 second old and there hasn't been a "screen changing" update after it. leave it in the pending array if it's not old enough and there hasn't been a "screen changing" update after it.
  • keep the existing deduplication logic that takes into account conversation length and latest status

up to you if you want to implement it. I'm fine with merging the current version once you address the feedback


// tryCreateAgentClient returns a new client from the command context. It works
// just like tryCreateAgentClient, but does not error.
func (r *RootCmd) tryCreateAgentClient() (*agentsdk.Client, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect the names to be swapped: tryCreateAgentClient should return an error, createAgentClient should not. "try" implies the possibility of failure.

Copy link
Member Author

@code-asher code-asher Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very fair. For this though I just mirrored TryInitClient. I think the logic is that try here is more like attempt, in the sense that failing is OK, or maybe in the way that try in try/catch suppresses errors. Happy to rename it, but maybe we should rename both if we do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming both would be perfect.

@code-asher code-asher force-pushed the asher/report-task branch 2 times, most recently from 43350ec to 5512b1a Compare June 12, 2025 02:11
For now, the old function has been renamed to tryCreateAgentClient
because it is not clear to me if it is safe to error in these cases or
if it was intentional to create a client even if there was no valid URL
or token.
Before we were separately reading the environment, but we can reuse the
global flags and the function for creating an agent client.

Also, since the tests no longer need to set environment variables they
can now be ran in parallel.

This is not fully backwards compatible in that the old method would fall
back to the non-agent client URL if the agent URL was not set, but that
seems like undesired behavior since we are not doing that anywhere else.
Since we can now get status updates from two places, they are placed in
a queue so we can handle them one at a time.
Instead of the pop phase.  This ensures we do not queue up updates that
will just end up being discarded once they are popped (which could take
some time due to latency to coderd).

It also has the side effect of preserving summaries even when the queue
gets too big, because now we preserve them as part of pushing, before
they might get lost due to getting dropped while we wait on coderd.
@code-asher
Copy link
Member Author

I force pushed to break out the UI fix but the remaining commits are unchanged.

Then I moved the update predicates to the push phase instead of the pop phase, so now if we are waiting on coderd, we will not queue up a bunch of duplicate updates that will just end up getting discarded later anyway. Now they get discarded immediately.

This could still use channels, I would just need a mutex over lastReport/lastMessageID. But, although I know we are unlikely to hit the limit I do still like that the queue can drop the oldest update.

I am also doing the part where we keep the last summary from the agent during the push phase instead, which is nice because now even if an agent update is dropped while waiting on coderd because the queue got too big we can still log it with the next agentapi status update. Granted, with a buffer of 512 this is probably not going to actually happen.

@code-asher code-asher marked this pull request as ready for review June 12, 2025 23:30
Copy link
Contributor

@hugodutka hugodutka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's awesome. I set it up in a local deployment and it works! Both for reporting status if the ai agent doesn't, and for completing statuses the ai agent didn't.

Minor nits on naming: I believe it's more accurate to refer to claude code and others as "AI Agents" rather than LLMs. Also, whenever referring to the url of a coder/agentapi process, I think we should say "AI agentapi url" rather than "llm agent url" for accuracy.

Also TODO: we need to update the claude code and goose terraform modules to use this.

Also, any references to the API are renamed as "AgentAPI" rather than
just "agent".
@code-asher code-asher merged commit 4bd5609 into main Jun 13, 2025
40 checks passed
@code-asher code-asher deleted the asher/report-task branch June 13, 2025 20:53
@github-actions github-actions bot locked and limited conversation to collaborators Jun 13, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants