MCP Server Session Lost in Multi-Worker Environment #520

Mile-Away · 2025-04-15T12:38:17Z

Describe the bug

When deploying the MCP server in a Kubernetes environment with gunicorn's multi-process configuration, SSE connections disconnect after a period of time, resulting in subsequent messages receiving a 404 error "Could not find session for ID". This occurs because SSE sessions are created in one worker process, but subsequent requests may be routed to different worker processes where the session state is not shared.

Steps to reproduce

Deploy an MCP server in a Kubernetes environment using gunicorn
Configure gunicorn with multiple workers (workers > 1)
Connect a client to the MCP server and establish an SSE connection (initial connection succeeds)
Send the first message (successfully processed, returns 202)
Wait a few seconds or try to send a second message
Receive error: WARNING mcp.server.sse Could not find session for ID: xxx-xxx-xxx

Logs

[2025-04-15 19:26:26 +0800] [32] [INFO] connection open
127.0.0.1:48270 - "GET /mcps/sse HTTP/1.1" 200
127.0.0.1:48280 - "POST /mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2 HTTP/1.1" 202
127.0.0.1:48280 - "POST /mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2 HTTP/1.1" 202
127.0.0.1:48280 - "POST /mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2 HTTP/1.1" 202
127.0.0.1:48280 - "POST /mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2 HTTP/1.1" 202
127.0.0.1:48280 - "POST /mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2 HTTP/1.1" 202
127.0.0.1:39380 - "POST /mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2 HTTP/1.1" 202
[2025-04-15 19:26:47 +0800] [32] [WARNING] mcp.server.sse Could not find session for ID: cb7ed84c-8f2f-4109-b571-2f9fb025a5c2
127.0.0.1:53124 - "POST /mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2 HTTP/1.1" 404
[2025-04-15 19:26:47 +0800] [32] [ERROR] mcp.client.sse Error in post_writer: Client error '404 Not Found' for url 'http://localhost:8000/mcps/messages/?session_id=cb7ed84c8f2f4109b5712f9fb025a5c2'

Expected behavior

All messages, whether initial or subsequent, should be processed normally without any session not found errors. Even when using multiple workers, SSE sessions should maintain continuous connections.

Environment information

Deployment environment: Kubernetes
Web server: gunicorn + uvicorn.workers.UvicornWorker
Session ID handling: Server converts UUIDs without hyphens to the hyphenated format
Local environment (running directly with uvicorn, single process) does not exhibit this issue

Reproduction conditions

Using gunicorn with multiple workers (workers > 1)
Proxying multiple services (using ASGI routing)
Each worker maintains its own session state, with no session sharing between processes

Solution

I resolved this issue by setting the worker count to 1:

# Before
workers=$(nproc --all)  # Use all available CPU cores

# After
workers=1  # Use single worker to maintain session consistency

However, this is not an ideal solution as it limits the service's scalability.

Suggested improvements

Implement distributed session storage (e.g., Redis) in the MCP server to allow multiple workers to share session state
Document the MCP server's session management limitations in multi-process environments
Provide session recovery/reconnection mechanisms to handle session disconnections

Additional context

Server code (session.py) shows that session state is stored in process memory
In our implementation, we solved the problem by separating the MCP service from the main application, deploying it independently with a single worker
Regular ping operations can mitigate the issue but cannot completely solve the session state sharing problem in multi-process environments

Potential solution approach

Could the MCP server be modified to add a session storage abstraction layer that allows users to configure different session storage backends (memory, Redis, file, etc.) to support distributed deployments?

The text was updated successfully, but these errors were encountered:

yeison-liscano · 2025-04-16T03:51:44Z

I am facing the same problem. In our case we do not use multiple workers ,but in K8 each pod is a worker running the server. Due to autoscaling and load balancing the request does not arrive to the same pod each time. The problem here is that the combination of SSE + HTML is stateful. Streamable HTTP opens the door to have stateless implementations, but nor python nor typescript SDKs have official support yet, clients also could take some time to fully support Streamable HTTP. Take a look at this https://blog.christianposta.com/ai/understanding-mcp-recent-change-around-http-sse/ implementing a custom support could be the best solution at the time.

yeison-liscano · 2025-04-16T03:56:27Z

#459

yeison-liscano · 2025-04-16T03:58:39Z

#461
#443

chi2liu · 2025-04-17T02:45:39Z

I am facing the same problem. In our case we do not use multiple workers ,but in K8 each pod is a worker running the server. Due to autoscaling and load balancing the request does not arrive to the same pod each time. The problem here is that the combination of SSE + HTML is stateful. Streamable HTTP opens the door to have stateless implementations, but nor python nor typescript SDKs have official support yet, clients also could take some time to fully support Streamable HTTP. Take a look at this https://blog.christianposta.com/ai/understanding-mcp-recent-change-around-http-sse/ implementing a custom support could be the best solution at the time.

How can we implementate the stateless based on Streamable HTTP

yeison-liscano · 2025-04-18T16:22:08Z

Using only standard HTTP and forgetting about SSE https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http.

yeison-liscano · 2025-04-28T17:26:43Z

modelcontextprotocol/modelcontextprotocol#102 (comment)

The current implementation of the Python SDK is stateful. There is work underway to fix this: modelcontextprotocol/python-sdk#443. However, currently we need to eject from the MCP framework to use stateless requests. This is to avoid issues described in this issue: modelcontextprotocol/python-sdk#520, for example. I tested this in Claude by listing and making a request to a remote tool.

kpayyam1998 · 2025-05-22T07:52:52Z

Hi all please any one help on this , still we are facing the same issue.
Error in post_writter: Client error '404 Not found for url 'http://<server_ip>/messages.?session_id=*****'
Deployment environment: Kubernetes

how to resolve this ?

vishnurajasekharan · 2025-05-27T18:35:16Z

Generate a unique identifier for each user session, such as a user-id using a GUID, and include it as a custom header. Then, configure the Kubernetes ingress annotation as follows:
yaml

nginx.ingress.kubernetes.io/upstream-hash-by: "$http_x_user_id"

If your client connects directly to the ingress without passing through intermediate services like API Management (APIM), you can alternatively use the $remote_addr variable:
yaml

nginx.ingress.kubernetes.io/upstream-hash-by: "$remote_addr"

However, using a unique identifier (like a GUID-based user ID) is generally recommended over relying on IP addresses. This approach offers greater stability, especially in environments where client IPs may change or be shared.

You can also use Authorization header , but it may affect performance.

theNullP0inter · 2025-05-30T15:00:36Z

I am building a wrapper around this python-sdk which helps you to deploy mcp servers in multi-worker environment.

You can check the repo here: https://github.com/theNullP0inter/odinmcp

DevonFulcher mentioned this issue Apr 21, 2025

Stateless Remote MCP support dbt-labs/dbt-mcp#76

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MCP Server Session Lost in Multi-Worker Environment #520

MCP Server Session Lost in Multi-Worker Environment #520

Mile-Away commented Apr 15, 2025

yeison-liscano commented Apr 16, 2025

Uh oh!

yeison-liscano commented Apr 16, 2025

Uh oh!

yeison-liscano commented Apr 16, 2025

Uh oh!

chi2liu commented Apr 17, 2025

Uh oh!

yeison-liscano commented Apr 18, 2025

Uh oh!

yeison-liscano commented Apr 28, 2025

Uh oh!

kpayyam1998 commented May 22, 2025

Uh oh!

vishnurajasekharan commented May 27, 2025

Uh oh!

theNullP0inter commented May 30, 2025

Uh oh!

MCP Server Session Lost in Multi-Worker Environment #520

MCP Server Session Lost in Multi-Worker Environment #520

Comments

Mile-Away commented Apr 15, 2025

Describe the bug

Steps to reproduce

Logs

Expected behavior

Environment information

Reproduction conditions

Solution

Suggested improvements

Additional context

Potential solution approach

yeison-liscano commented Apr 16, 2025

Uh oh!

yeison-liscano commented Apr 16, 2025

Uh oh!

yeison-liscano commented Apr 16, 2025

Uh oh!

chi2liu commented Apr 17, 2025

Uh oh!

yeison-liscano commented Apr 18, 2025

Uh oh!

yeison-liscano commented Apr 28, 2025

Uh oh!

kpayyam1998 commented May 22, 2025

Uh oh!

vishnurajasekharan commented May 27, 2025

Uh oh!

theNullP0inter commented May 30, 2025

Uh oh!