-
Notifications
You must be signed in to change notification settings - Fork 99
Update Inference specification for Hugging Face's completion and chat completion tasks #4383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update Inference specification for Hugging Face's completion and chat completion tasks #4383
Conversation
… completion tasks
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec wise LGTM, let's hear from @szabosteve for the docs part!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, I left a few suggestions.
*/ | ||
rate_limit?: RateLimitSetting | ||
/** | ||
* The URL endpoint to use for the requests. | ||
* For `completion` and `chat_completion` tasks, endpoint must be compatible with the OpenAI API format and include `v1/chat/completions`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we expand this a little. Maybe something like:
* For `completion` and `chat_completion` tasks, endpoint must be compatible with the OpenAI API format and include `v1/chat/completions`. | |
* For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). The URL must include `v1/chat/completions`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since OpenAI mode is the only visible way of determining whether or not OpenAI API format is supported, I propose expanding it a bit more:
* For `completion` and `chat_completion` tasks, endpoint must be compatible with the OpenAI API format and include `v1/chat/completions`. | |
* For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). OpenAI mode must be enabled, and the endpoint URL must include `v1/chat/completions`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is OpenAI mode
? I searched hugging face to see if I couldn't find it. Are you referring to the request format that the inference API requires?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* For `completion` and `chat_completion` tasks, this field is optional but may be required for certain models — particularly when using serverless inference endpoints. | ||
* For the `text_embedding` task, this field is not required and will be ignored if provided. | ||
*/ | ||
model_id?: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lcawl @szabosteve In the previous docs system we created separate sections for the settings per task type. Is there a way to do that in the new docs system? What I mean is, it'd probably be easier to go to the hugging face doc page and click on a section to see all fields that are applicable for a particular task type rather than having to go through all fields and look at the text to see if it applies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathan-buttner I think there is a solution to this issue, but I need to investigate a bit. I don't want to block this PR, so please go ahead and merge if you find it good, and I'll fix these when I have a working solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to give an example, if a user includes model_id
for a text_embedding
task type request, the request will fail. If the user includes model_id
for completion
or chat_completion
the request will succeed.
So the field is required depending on the model, but normally optional when using completion
or chat_completion
. For text_embedding
the field is invalid and will result in an error.
@@ -44,6 +47,15 @@ import { Id } from '@_types/common' | |||
* * `e5-small-v2` | |||
* * `multilingual-e5-base` | |||
* * `multilingual-e5-small` | |||
* | |||
* For Elastic's `chat_completion` and `completion` tasks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lcawl @szabosteve Similar to my previous comment, would it make sense to create a put request per task type? I'm not sure if that would work or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathan-buttner It is certainly possible from a technical standpoint. However, we already have a lot of similar endpoints under the inference namespace. If we had multiple pages for an integration type (one for each task type the integration supports), the confusion might be even worse. It would also affect the left-side navigation.
I understand your point, and I think it's a valid concern. I just don't think creating multiple pages for the same integration would be the best solution to address it. I'll talk to others, too, and try to come up with a solution soon. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree having multiple pages would likely make it worse. I'm not sure what a good solution would be. Maybe if it's possible to have multiple expandable section within a single page or something. I think the underlying issue is that it's hard to determine what fields are applicable for a single task type without having to read all the text for all the fields. Maybe we could leverage examples to show the fields that should go with each task type.
@@ -1,6 +1,6 @@ | |||
{ | |||
"dependencies": { | |||
"@redocly/cli": "^1.34.1", | |||
"@redocly/cli": "^1.34.3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is being made automatically by running pre commit set of tasks. This value is being incremented along with different changes committed over the time, so ignoring the change made by pre commit set of tasks would have to have a reason behind it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks!
@@ -5,7 +5,7 @@ | |||
"packages": { | |||
"": { | |||
"dependencies": { | |||
"@redocly/cli": "^1.34.1", | |||
"@redocly/cli": "^1.34.3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this file supposed to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As answered above change is being made automatically by running pre commit set of tasks. If we don't have reason for ignoring changes made by pre commit set of tasks - I'd keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep sounds good, thanks for explaining.
…chat-completion-integration # Conflicts: # output/schema/schema-serverless.json # output/schema/schema.json
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments and a tiny suggestion, otherwise LGTM!
* For `completion` and `chat_completion` tasks, this field is optional but may be required for certain models — particularly when using serverless inference endpoints. | ||
* For the `text_embedding` task, this field is not required and will be ignored if provided. | ||
*/ | ||
model_id?: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathan-buttner I think there is a solution to this issue, but I need to investigate a bit. I don't want to block this PR, so please go ahead and merge if you find it good, and I'll fix these when I have a working solution.
@@ -29,13 +29,16 @@ import { Id } from '@_types/common' | |||
/** | |||
* Create a Hugging Face inference endpoint. | |||
* | |||
* Create an inference endpoint to perform an inference task with the `hugging_face` service. | |||
* Creates an inference endpoint to perform an inference task with the `hugging_face` service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest changing it back to be consistent with the rest of the endpoint docs.
* Creates an inference endpoint to perform an inference task with the `hugging_face` service. | |
* Create an inference endpoint to perform an inference task with the `hugging_face` service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI Using of "Create" vs "Creates" is not consistent across the endpoint docs. For Amazon and Mistral it is "Creates" and it was taken as example. Thought it made more sense describing "what it does", but now I see that it is invitation to "do it".
Changed it since for the most of the providers it is "Create".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jan-Kazlouski-elastic Thanks! I'll fix Amazon and Mistral.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @szabosteve
@@ -44,6 +47,15 @@ import { Id } from '@_types/common' | |||
* * `e5-small-v2` | |||
* * `multilingual-e5-base` | |||
* * `multilingual-e5-small` | |||
* | |||
* For Elastic's `chat_completion` and `completion` tasks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathan-buttner It is certainly possible from a technical standpoint. However, we already have a lot of similar endpoints under the inference namespace. If we had multiple pages for an integration type (one for each task type the integration supports), the confusion might be even worse. It would also affect the left-side navigation.
I understand your point, and I think it's a valid concern. I just don't think creating multiple pages for the same integration would be the best solution to address it. I'll talk to others, too, and try to come up with a solution soon. WDYT?
*/ | ||
rate_limit?: RateLimitSetting | ||
/** | ||
* The URL endpoint to use for the requests. | ||
* For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). OpenAI mode must be enabled, and the endpoint URL must include `v1/chat/completions`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenAI mode must be enabled
What is OpenAI mode?
Can we remove that portion of the sentence?
* For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). OpenAI mode must be enabled, and the endpoint URL must include `v1/chat/completions`. | |
* For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). The endpoint URL must include `v1/chat/completions`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I hadn't seen that before.
Can you try disabling it and see if a chat completion request using the inference API still works? I wonder if that only controls the UI in hugging face.
Can you give me an example of a model that has it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try disabling it and see if a chat completion request using the inference API still works?
After disabling it on UI - it still works, but url that is presented to the client doesn't contain v1/chat/completions
. So basically switching it on and off doesn't keep endpoint from processing OpenAI payloads if they are sent to the /v1/chat/completions
, but hides the /v1/chat/completions
section of the URL and provides different payload example. So if OpenAI mode is turned off or not there at all for a specific model - then full URL with /v1/chat/completions
is hidden - client can't see/use the URL that must be used for integration.
Client might have prior knowledge on how url should look like and we're saying that it must contain /v1/chat/completions
, but it is not safe to imply that customer will understand this section must be added on top of the regular url. Specially if model doesn't support OpenAI payload and attempt to include /v1/chat/completions
will be made and error will be returned because model doesn't support it.
I guess we could make it clearer by telling customer that absence of /v1/chat/completions
on running model page can be caused by OpenAI mode being turned off. And enabling this mode if this toggle is present when url doesn't contain /v1/chat/completions
can lead to it being shown. But since this mode always been turned ON by default for me I think this issue has really low risks. BUT! Presence of this toggle is one of the signs that model is capable of processing OpenAI type payloads and can be used for integration. So for me personally having this note is making us safer.
But you mentioning that you hadn't seen it before and obviously running successful tests in the past makes me wonder if it is something local to only some of the models.
I wonder if that only controls the UI in hugging face.
Yes it just controls UI.
Can you give me an example of a model that has it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. I wonder if it's be more helpful to put this information and an example picture like you have somewhere else in the docs. @szabosteve what do you think?
My suggestion is to would be to have the text as something like:
For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). The endpoint URL must include `v1/chat/completions`. To determine if the model supports the Hugging Face Chat Completion Interface and to access the correct URL follow the information here.
We'd have a link or something to either another page or a different section of the page that explains that the deployment should have a toggle for OpenAI. Then to get the correct URL they should enable the toggle and ensure the URL ends with v1/chat/completions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best would be to present this information somehow here. I'm afraid that linking to another docs page from the reference for such a low-level detail would be hiding information. I suggest something like this:
* For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). OpenAI mode must be enabled, and the endpoint URL must include `v1/chat/completions`. | |
* For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (see the linked external documentation for details). The endpoint URL for the request must include `/v1/chat/completions`. | |
* If the model supports the OpenAI Chat Completion schema, a toggle should appear in the interface. Enabling this toggle doesn't change any model behavior, it reveals the full endpoint URL needed (which should include `/v1/chat/completions`) when configuring the inference endpoint in Elasticsearch. If the model doesn't support this schema, the toggle may not be shown. | |
* @ext_doc_id huggingface-chat-completion-interface |
And then add the following to the table.csv:
huggingface-chat-completion-interface, https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms
It will provide a link with the text External documentation
at the end of the description, it makes the docs a bit more readable.
@jonathan-buttner What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that looks good. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied change proposed by @szabosteve
* For `completion` and `chat_completion` tasks, this field is optional but may be required for certain models — particularly when using serverless inference endpoints. | ||
* For the `text_embedding` task, this field is not required and will be ignored if provided. | ||
*/ | ||
model_id?: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to give an example, if a user includes model_id
for a text_embedding
task type request, the request will fail. If the user includes model_id
for completion
or chat_completion
the request will succeed.
So the field is required depending on the model, but normally optional when using completion
or chat_completion
. For text_embedding
the field is invalid and will result in an error.
* After the endpoint is initialized (for dedicated) or ready (for serverless), ensure it supports the OpenAI API and includes `/v1/chat/completions` part in URL. Then, copy the full endpoint URL for use. | ||
* Recommended models for `chat_completion` and `completion` tasks: | ||
* | ||
* * `Mistral-7B-Instruct-v0.2` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szabosteve should we include the full URL link to these models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather not include the full URLs. Currently, we can provide only one link per description; otherwise, the generator complains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
…chat-completion-integration # Conflicts: # output/openapi/elasticsearch-openapi.json # output/openapi/elasticsearch-serverless-openapi.json # output/schema/schema-serverless.json # output/schema/schema.json
…d completion tasks
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
…chat-completion-integration # Conflicts: # output/openapi/elasticsearch-openapi.json # output/openapi/elasticsearch-serverless-openapi.json # output/schema/schema.json
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
This PR is for changes to specification caused by elastic/elasticsearch#127254:
Extended Task Support:
Model Requirements for Chat Tasks:
New Configuration Parameters:
Rate Limit Clarifications:
Documentation Fixes:
Additional actions
Signed the CLA
Executed make contrib