-
Notifications
You must be signed in to change notification settings - Fork 15
Added Abstract Type for Model Server Client #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
2b5f054
3140c92
d1ee368
38dcc46
d478975
12ed390
d2fcbb0
6eea97c
91b8b78
4e17fa5
9e6d0b5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Custom Clients | ||
|
||
All custom clients are organized in this directory including model server and metrics clients. The directory structure is organized to reflect relationships/commonalities between clients. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Model Server Clients | ||
|
||
Common functionality appears beween model servers with similar input and output types. These model servers are organized accordingly. | ||
|
||
Todo: | ||
- **Text to Text**: | ||
- Naive_transformers | ||
- tensorrt_llm_triton | ||
- sax | ||
- tgi | ||
- vllm | ||
- jetstream | ||
- **Text to Image**: | ||
- Maxdiffusion |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Copyright 2025 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
from abc import ABC, abstractmethod | ||
from typing import Any, List | ||
import asyncio | ||
import aiohttp | ||
|
||
|
||
class ErrorsReport: | ||
ClientConnectorErrors: int | ||
TimeoutErrors: int | ||
ContentTypeErrors: int | ||
ClientOSErrors: int | ||
ServerDisconnectedErrors: int | ||
unknown_errors: int | ||
|
||
def __init__(self) -> None: | ||
self.ClientConnectorErrors = 0 | ||
self.TimeoutErrors = 0 | ||
self.ContentTypeErrors = 0 | ||
self.ClientOSErrors = 0 | ||
self.ServerDisconnectedErrors = 0 | ||
self.unknown_errors = 0 | ||
|
||
def to_dict(self) -> dict[str, int]: | ||
return {k: v for k, v in self.__dict__.items() if isinstance(v, int)} | ||
|
||
def record_error(self, error: Exception) -> None: | ||
if isinstance(error, aiohttp.client_exceptions.ClientConnectorError): | ||
self.ClientConnectorErrors += 1 | ||
print(f"ClientConnectorError: {error}") | ||
elif isinstance(error, asyncio.TimeoutError): | ||
self.TimeoutErrors += 1 | ||
print(f"TimeoutError: {error}") | ||
elif isinstance(error, aiohttp.client_exceptions.ContentTypeError): | ||
self.ContentTypeErrors += 1 | ||
print(f"ContentTypeError: {error}") | ||
elif isinstance(error, aiohttp.client_exceptions.ClientOSError): | ||
self.ClientOSErrors += 1 | ||
print(f"ClientOSError: {error}") | ||
elif isinstance(error, aiohttp.client_exceptions.ServerDisconnectedError): | ||
self.ServerDisconnectedErrors += 1 | ||
print(f"ServerDisconnectedError: {error}") | ||
else: | ||
self.unknown_errors += 1 | ||
print(f"Unknown error: {error}") | ||
|
||
def append_report(self, report: "ErrorsReport") -> None: | ||
self.ClientConnectorErrors += report.ClientConnectorErrors | ||
self.TimeoutErrors += report.TimeoutErrors | ||
self.ContentTypeErrors += report.ContentTypeErrors | ||
self.ClientOSErrors += report.ClientOSErrors | ||
self.ServerDisconnectedErrors += report.ServerDisconnectedErrors | ||
self.unknown_errors += report.unknown_errors | ||
|
||
|
||
class Model_Server_Client(ABC): | ||
# The client will collect a summary of all observed errors | ||
Errors: ErrorsReport | ||
|
||
@abstractmethod | ||
def summary(self) -> Any: | ||
""" | ||
Returns summary data derived from all inputs and outputs, depends on the clients input and output data types and as such subclasses should implement this at the client data type level (e.g., text-to-text, text-to-image). | ||
""" | ||
pass | ||
|
||
@abstractmethod | ||
def request(self, *args: Any, **kwargs: Any) -> Any: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would like to see a concrete implementation alongside this for a model server. vLLM is a good starting point so we can see how this looks in practice and if we need to change / add to this interface. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is in progress because agree, idea is to do something like this for the text-to-text class:
That way this is all we would need for a vllm client:
Simillar for jetstream for jetstream:
|
||
""" | ||
This is the method loadgen should use to make requests to a model server | ||
""" | ||
pass | ||
|
||
@abstractmethod | ||
def list_model_server_metrics(self) -> list[str]: | ||
""" | ||
Returns list of model server metrics of interest. | ||
""" | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say, we don't need a separate sub module for model server clients right away. We can start with the model server client being a separate file. As we get more clients, we can split them as needed. Otherwise importing all the submodules requires additional work on the part of the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can get behind that, maybe keeping text-to-text and text-to-image servers separate may be less confusing in the long run especially if benchmarking diffusion models is on our roadmap. Having a dedicated text-to-text abstract class is to deduplicate the common functions like making requests since that procedure is going to be the same regardless of model server (create request, send request, parse relevant info from response). See the example in the other comment.