-
Notifications
You must be signed in to change notification settings - Fork 88
OpenAI API conformance tests #513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This should be validated with major model servers (i.e.: vLLM, Triton, TGI, potentially sglang or jetstream) |
Suggest we validate this by starting with the openai client as called from python, as that is a) how most ecosystem tools will interact with the gateway and b) likely how many ML engineers will invoke the gateway for non-trivial interactions. |
I also suggest we accelerate validating the openai client in both regular and error configuration before v0.3, since we are increasing visibility at Kubecon. |
Just to clarify, this issue also tracks making the current EPP conformant, not just adding conformance tests. |
True, but the only scope of the EPP that would impact here is just how we error handle. A suite of conformance that give a user confidence that their specific blend of: Model Servers + EPP will still work with OpenAI API spec is strongly valuable (and a good suite should catch where the EPP is nonconformant) |
I'm considering moving this to a discussion, as the Open AI API seems to be partially implemented by model servers (such as vLLM) and there is no concrete contract as to which endpoints are supported. Colloquially, the /v1/completions and /v1/chat/completions endpoints are supported, but the v1/completions is considered legacy. Perhaps instead we should specify what IGW currently expects: right now we just expect that there is a |
What would you like to be added: A suite of conformance tests to validate that Inference Gateway + underlying model servers are compliant to the OpenAI API spec. We can start with searching to see if such conformance tests already exist, and if not, it would be good for us to support such a test suite.
Why is this needed: Users are going to have heterogeneous model servers eventually, and we should make sure that they all conform to the same spec, since Inference Gateway is where a user would experience the variance between model servers, we should be able to provide a test suite to ensure they still have consistent API behavior
The text was updated successfully, but these errors were encountered: