-
Notifications
You must be signed in to change notification settings - Fork 1.6k
MCP Server Session Lost in Multi-Worker Environment #520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am facing the same problem. In our case we do not use multiple workers ,but in K8 each pod is a worker running the server. Due to autoscaling and load balancing the request does not arrive to the same pod each time. The problem here is that the combination of SSE + HTML is stateful. Streamable HTTP opens the door to have stateless implementations, but nor python nor typescript SDKs have official support yet, clients also could take some time to fully support Streamable HTTP. Take a look at this https://blog.christianposta.com/ai/understanding-mcp-recent-change-around-http-sse/ implementing a custom support could be the best solution at the time. |
How can we implementate the stateless based on Streamable HTTP |
Using only standard HTTP and forgetting about SSE https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http. |
The current implementation of the Python SDK is stateful. There is work underway to fix this: modelcontextprotocol/python-sdk#443. However, currently we need to eject from the MCP framework to use stateless requests. This is to avoid issues described in this issue: modelcontextprotocol/python-sdk#520, for example. I tested this in Claude by listing and making a request to a remote tool.
Hi all please any one help on this , still we are facing the same issue. how to resolve this ? |
Generate a unique identifier for each user session, such as a user-id using a GUID, and include it as a custom header. Then, configure the Kubernetes ingress annotation as follows: nginx.ingress.kubernetes.io/upstream-hash-by: "$http_x_user_id" If your client connects directly to the ingress without passing through intermediate services like API Management (APIM), you can alternatively use the $remote_addr variable: nginx.ingress.kubernetes.io/upstream-hash-by: "$remote_addr" However, using a unique identifier (like a GUID-based user ID) is generally recommended over relying on IP addresses. This approach offers greater stability, especially in environments where client IPs may change or be shared. You can also use Authorization header , but it may affect performance. |
I am building a wrapper around this python-sdk which helps you to deploy mcp servers in multi-worker environment. You can check the repo here: https://github.com/theNullP0inter/odinmcp |
Describe the bug
When deploying the MCP server in a Kubernetes environment with gunicorn's multi-process configuration, SSE connections disconnect after a period of time, resulting in subsequent messages receiving a 404 error "Could not find session for ID". This occurs because SSE sessions are created in one worker process, but subsequent requests may be routed to different worker processes where the session state is not shared.
Steps to reproduce
workers > 1
)WARNING mcp.server.sse Could not find session for ID: xxx-xxx-xxx
Logs
Expected behavior
All messages, whether initial or subsequent, should be processed normally without any session not found errors. Even when using multiple workers, SSE sessions should maintain continuous connections.
Environment information
Reproduction conditions
workers > 1
)Solution
I resolved this issue by setting the worker count to 1:
However, this is not an ideal solution as it limits the service's scalability.
Suggested improvements
Additional context
session.py
) shows that session state is stored in process memoryPotential solution approach
Could the MCP server be modified to add a session storage abstraction layer that allows users to configure different session storage backends (memory, Redis, file, etc.) to support distributed deployments?
The text was updated successfully, but these errors were encountered: