Update Inference specification for Hugging Face's completion and chat completion tasks #4383

Jan-Kazlouski-elastic · 2025-05-19T18:07:07Z

This PR is for changes to specification caused by elastic/elasticsearch#127254:

Extended Task Support:

Added completion and chat_completion tasks to the list of supported Hugging Face tasks.

Model Requirements for Chat Tasks:

Updated documentation to describe specific requirements for using chat_completion and completion tasks, including model compatibility with the OpenAI API format and usage guidelines for serverless vs. dedicated endpoints.

New Configuration Parameters:

Introduced optional model_id field in Hugging Face service settings, applicable to completion and chat_completion tasks.

Rate Limit Clarifications:

Updated rate_limit documentation to clarify default behavior and guidance for tuning based on deployment specifics.

Documentation Fixes:

Corrected typos in existing text_embedding request examples.

Additional actions

Signed the CLA
Executed make contrib

… completion tasks

github-actions · 2025-05-19T18:08:46Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.inference`	⚪	Missing test	Missing test
`inference.put_alibabacloud`	⚪	Missing test	Missing test
`inference.put_amazonbedrock`	⚪	Missing test	Missing test
`inference.put_anthropic`	⚪	Missing test	Missing test
`inference.put_azureaistudio`	⚪	Missing test	Missing test
`inference.put_azureopenai`	⚪	Missing test	Missing test
`inference.put_cohere`	⚪	Missing test	Missing test
`inference.put_elasticsearch`	⚪	Missing test	Missing test
`inference.put_elser`	⚪	Missing test	Missing test
`inference.put_googleaistudio`	⚪	Missing test	Missing test
`inference.put_googlevertexai`	⚪	Missing test	Missing test
`inference.put_hugging_face`	⚪	Missing test	Missing test
`inference.put_jinaai`	⚪	Missing test	Missing test
`inference.put_mistral`	⚪	Missing test	Missing test
`inference.put_openai`	⚪	Missing test	Missing test
`inference.put_voyageai`	⚪	Missing test	Missing test
`inference.put_watsonx`	⚪	Missing test	Missing test
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

…tion

github-actions · 2025-05-26T12:48:00Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.inference`	⚪	Missing test	Missing test
`inference.put_alibabacloud`	⚪	Missing test	Missing test
`inference.put_amazonbedrock`	⚪	Missing test	Missing test
`inference.put_anthropic`	⚪	Missing test	Missing test
`inference.put_azureaistudio`	⚪	Missing test	Missing test
`inference.put_azureopenai`	⚪	Missing test	Missing test
`inference.put_cohere`	⚪	Missing test	Missing test
`inference.put_elasticsearch`	⚪	Missing test	Missing test
`inference.put_elser`	⚪	Missing test	Missing test
`inference.put_googleaistudio`	⚪	Missing test	Missing test
`inference.put_googlevertexai`	⚪	Missing test	Missing test
`inference.put_hugging_face`	⚪	Missing test	Missing test
`inference.put_jinaai`	⚪	Missing test	Missing test
`inference.put_mistral`	⚪	Missing test	Missing test
`inference.put_openai`	⚪	Missing test	Missing test
`inference.put_voyageai`	⚪	Missing test	Missing test
`inference.put_watsonx`	⚪	Missing test	Missing test
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

l-trotta

spec wise LGTM, let's hear from @szabosteve for the docs part!

jonathan-buttner

Thanks for the PR, I left a few suggestions.

jonathan-buttner · 2025-05-28T17:25:25Z

specification/inference/_types/CommonTypes.ts

   */
  rate_limit?: RateLimitSetting
  /**
   * The URL endpoint to use for the requests.
+   * For `completion` and `chat_completion` tasks, endpoint must be compatible with the OpenAI API format and include `v1/chat/completions`.


How about we expand this a little. Maybe something like:

Suggested change

* For `completion` and `chat_completion` tasks, endpoint must be compatible with the OpenAI API format and include `v1/chat/completions`.

* For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). The URL must include `v1/chat/completions`.

Since OpenAI mode is the only visible way of determining whether or not OpenAI API format is supported, I propose expanding it a bit more:

Suggested change

* For `completion` and `chat_completion` tasks, endpoint must be compatible with the OpenAI API format and include `v1/chat/completions`.

* For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). OpenAI mode must be enabled, and the endpoint URL must include `v1/chat/completions`.

What is OpenAI mode? I searched hugging face to see if I couldn't find it. Are you referring to the request format that the inference API requires?

Did the change

Sorry, missed your question regarding OpenAI mode. OpenAI mode is what I call the toggle that needs to be enabled in order for endpoint to use OpenAI API.
Not every model has this toggle - so not every model can be used for elastic inference.

jonathan-buttner · 2025-05-28T18:19:31Z

specification/inference/_types/CommonTypes.ts

+   * For `completion` and `chat_completion` tasks, this field is optional but may be required for certain models — particularly when using serverless inference endpoints.
+   * For the `text_embedding` task, this field is not required and will be ignored if provided.
+   */
+  model_id?: string


@lcawl @szabosteve In the previous docs system we created separate sections for the settings per task type. Is there a way to do that in the new docs system? What I mean is, it'd probably be easier to go to the hugging face doc page and click on a section to see all fields that are applicable for a particular task type rather than having to go through all fields and look at the text to see if it applies.

@jonathan-buttner I think there is a solution to this issue, but I need to investigate a bit. I don't want to block this PR, so please go ahead and merge if you find it good, and I'll fix these when I have a working solution.

Just to give an example, if a user includes model_id for a text_embedding task type request, the request will fail. If the user includes model_id for completion or chat_completion the request will succeed.

So the field is required depending on the model, but normally optional when using completion or chat_completion. For text_embedding the field is invalid and will result in an error.

jonathan-buttner · 2025-05-28T18:20:39Z

specification/inference/put_hugging_face/PutHuggingFaceRequest.ts

@@ -44,6 +47,15 @@ import { Id } from '@_types/common'
 * * `e5-small-v2`
 * * `multilingual-e5-base`
 * * `multilingual-e5-small`
+ *
+ * For Elastic's `chat_completion` and `completion` tasks:


@lcawl @szabosteve Similar to my previous comment, would it make sense to create a put request per task type? I'm not sure if that would work or not.

@jonathan-buttner It is certainly possible from a technical standpoint. However, we already have a lot of similar endpoints under the inference namespace. If we had multiple pages for an integration type (one for each task type the integration supports), the confusion might be even worse. It would also affect the left-side navigation.
I understand your point, and I think it's a valid concern. I just don't think creating multiple pages for the same integration would be the best solution to address it. I'll talk to others, too, and try to come up with a solution soon. WDYT?

Yeah I agree having multiple pages would likely make it worse. I'm not sure what a good solution would be. Maybe if it's possible to have multiple expandable section within a single page or something. I think the underlying issue is that it's hard to determine what fields are applicable for a single task type without having to read all the text for all the fields. Maybe we could leverage examples to show the fields that should go with each task type.

jonathan-buttner · 2025-05-28T18:21:03Z

package.json

@@ -1,6 +1,6 @@
 {
  "dependencies": {
-    "@redocly/cli": "^1.34.1",
+    "@redocly/cli": "^1.34.3",


Do we need this change?

This change is being made automatically by running pre commit set of tasks. This value is being incremented along with different changes committed over the time, so ignoring the change made by pre commit set of tasks would have to have a reason behind it.

jonathan-buttner · 2025-05-28T18:21:21Z

package-lock.json

@@ -5,7 +5,7 @@
  "packages": {
    "": {
      "dependencies": {
-        "@redocly/cli": "^1.34.1",
+        "@redocly/cli": "^1.34.3",


Was this file supposed to change?

As answered above change is being made automatically by running pre commit set of tasks. If we don't have reason for ignoring changes made by pre commit set of tasks - I'd keep it.

Yep sounds good, thanks for explaining.

…chat-completion-integration # Conflicts: # output/schema/schema-serverless.json # output/schema/schema.json

github-actions · 2025-05-29T13:07:19Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.inference`	⚪	Missing test	Missing test
`inference.put_alibabacloud`	⚪	Missing test	Missing test
`inference.put_amazonbedrock`	⚪	Missing test	Missing test
`inference.put_anthropic`	⚪	Missing test	Missing test
`inference.put_azureaistudio`	⚪	Missing test	Missing test
`inference.put_azureopenai`	⚪	Missing test	Missing test
`inference.put_cohere`	⚪	Missing test	Missing test
`inference.put_elasticsearch`	⚪	Missing test	Missing test
`inference.put_elser`	⚪	Missing test	Missing test
`inference.put_googleaistudio`	⚪	Missing test	Missing test
`inference.put_googlevertexai`	⚪	Missing test	Missing test
`inference.put_hugging_face`	⚪	Missing test	Missing test
`inference.put_jinaai`	⚪	Missing test	Missing test
`inference.put_mistral`	⚪	Missing test	Missing test
`inference.put_openai`	⚪	Missing test	Missing test
`inference.put_voyageai`	⚪	Missing test	Missing test
`inference.put_watsonx`	⚪	Missing test	Missing test
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

szabosteve

Left a few comments and a tiny suggestion, otherwise LGTM!

szabosteve · 2025-06-02T12:19:29Z

specification/inference/_types/CommonTypes.ts

+   * For `completion` and `chat_completion` tasks, this field is optional but may be required for certain models — particularly when using serverless inference endpoints.
+   * For the `text_embedding` task, this field is not required and will be ignored if provided.
+   */
+  model_id?: string


@jonathan-buttner I think there is a solution to this issue, but I need to investigate a bit. I don't want to block this PR, so please go ahead and merge if you find it good, and I'll fix these when I have a working solution.

szabosteve · 2025-06-02T12:23:12Z

specification/inference/put_hugging_face/PutHuggingFaceRequest.ts

@@ -29,13 +29,16 @@ import { Id } from '@_types/common'
 /**
 * Create a Hugging Face inference endpoint.
 *
- * Create an inference endpoint to perform an inference task with the `hugging_face` service.
+ * Creates an inference endpoint to perform an inference task with the `hugging_face` service.


I suggest changing it back to be consistent with the rest of the endpoint docs.

Suggested change

* Creates an inference endpoint to perform an inference task with the `hugging_face` service.

* Create an inference endpoint to perform an inference task with the `hugging_face` service.

FYI Using of "Create" vs "Creates" is not consistent across the endpoint docs. For Amazon and Mistral it is "Creates" and it was taken as example. Thought it made more sense describing "what it does", but now I see that it is invitation to "do it".
Changed it since for the most of the providers it is "Create".

@Jan-Kazlouski-elastic Thanks! I'll fix Amazon and Mistral.

Thank you @szabosteve

szabosteve · 2025-06-02T12:34:44Z

specification/inference/put_hugging_face/PutHuggingFaceRequest.ts

@@ -44,6 +47,15 @@ import { Id } from '@_types/common'
 * * `e5-small-v2`
 * * `multilingual-e5-base`
 * * `multilingual-e5-small`
+ *
+ * For Elastic's `chat_completion` and `completion` tasks:


@jonathan-buttner It is certainly possible from a technical standpoint. However, we already have a lot of similar endpoints under the inference namespace. If we had multiple pages for an integration type (one for each task type the integration supports), the confusion might be even worse. It would also affect the left-side navigation.
I understand your point, and I think it's a valid concern. I just don't think creating multiple pages for the same integration would be the best solution to address it. I'll talk to others, too, and try to come up with a solution soon. WDYT?

jonathan-buttner · 2025-06-02T17:38:46Z

specification/inference/_types/CommonTypes.ts

   */
  rate_limit?: RateLimitSetting
  /**
   * The URL endpoint to use for the requests.
+   * For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). OpenAI mode must be enabled, and the endpoint URL must include `v1/chat/completions`.


OpenAI mode must be enabled

What is OpenAI mode?

Can we remove that portion of the sentence?

Suggested change

* For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). OpenAI mode must be enabled, and the endpoint URL must include `v1/chat/completions`.

* For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). The endpoint URL must include `v1/chat/completions`.

OpenAI mode is the toggle that needs to be enabled in order for endpoint to use OpenAI API.
I think that having this info is useful for the customer. Not every model has this toggle - so not every model can be used for elastic inference.
Do you still want me to remove it?

Ah I hadn't seen that before.

Can you try disabling it and see if a chat completion request using the inference API still works? I wonder if that only controls the UI in hugging face.

Can you give me an example of a model that has it?

Can you try disabling it and see if a chat completion request using the inference API still works?

After disabling it on UI - it still works, but url that is presented to the client doesn't contain v1/chat/completions. So basically switching it on and off doesn't keep endpoint from processing OpenAI payloads if they are sent to the /v1/chat/completions, but hides the /v1/chat/completions section of the URL and provides different payload example. So if OpenAI mode is turned off or not there at all for a specific model - then full URL with /v1/chat/completions is hidden - client can't see/use the URL that must be used for integration.
Client might have prior knowledge on how url should look like and we're saying that it must contain /v1/chat/completions, but it is not safe to imply that customer will understand this section must be added on top of the regular url. Specially if model doesn't support OpenAI payload and attempt to include /v1/chat/completions will be made and error will be returned because model doesn't support it.

I guess we could make it clearer by telling customer that absence of /v1/chat/completions on running model page can be caused by OpenAI mode being turned off. And enabling this mode if this toggle is present when url doesn't contain /v1/chat/completions can lead to it being shown. But since this mode always been turned ON by default for me I think this issue has really low risks. BUT! Presence of this toggle is one of the signs that model is capable of processing OpenAI type payloads and can be used for integration. So for me personally having this note is making us safer.
But you mentioning that you hadn't seen it before and obviously running successful tests in the past makes me wonder if it is something local to only some of the models.

UI when OpenAI mode is
On:

Off:

I wonder if that only controls the UI in hugging face.

Yes it just controls UI.

Can you give me an example of a model that has it?

https://huggingface.co/microsoft/Phi-3-mini-128k-instruct

Ah I see. I wonder if it's be more helpful to put this information and an example picture like you have somewhere else in the docs. @szabosteve what do you think?

My suggestion is to would be to have the text as something like:

For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). The endpoint URL must include `v1/chat/completions`. To determine if the model supports the Hugging Face Chat Completion Interface and to access the correct URL follow the information here.

We'd have a link or something to either another page or a different section of the page that explains that the deployment should have a toggle for OpenAI. Then to get the correct URL they should enable the toggle and ensure the URL ends with v1/chat/completions.

I think the best would be to present this information somehow here. I'm afraid that linking to another docs page from the reference for such a low-level detail would be hiding information. I suggest something like this:

Suggested change

* For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). OpenAI mode must be enabled, and the endpoint URL must include `v1/chat/completions`.

* For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (see the linked external documentation for details). The endpoint URL for the request must include `/v1/chat/completions`.

* If the model supports the OpenAI Chat Completion schema, a toggle should appear in the interface. Enabling this toggle doesn't change any model behavior, it reveals the full endpoint URL needed (which should include `/v1/chat/completions`) when configuring the inference endpoint in Elasticsearch. If the model doesn't support this schema, the toggle may not be shown.

* @ext_doc_id huggingface-chat-completion-interface

And then add the following to the table.csv:

huggingface-chat-completion-interface, https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms

It will provide a link with the text External documentation at the end of the description, it makes the docs a bit more readable.

@jonathan-buttner What do you think?

Yep that looks good. Thanks!

Applied change proposed by @szabosteve

jonathan-buttner · 2025-06-02T17:46:07Z

specification/inference/_types/CommonTypes.ts

+   * For `completion` and `chat_completion` tasks, this field is optional but may be required for certain models — particularly when using serverless inference endpoints.
+   * For the `text_embedding` task, this field is not required and will be ignored if provided.
+   */
+  model_id?: string


Just to give an example, if a user includes model_id for a text_embedding task type request, the request will fail. If the user includes model_id for completion or chat_completion the request will succeed.

So the field is required depending on the model, but normally optional when using completion or chat_completion. For text_embedding the field is invalid and will result in an error.

specification/inference/_types/CommonTypes.ts

jonathan-buttner · 2025-06-02T17:49:10Z

specification/inference/put_hugging_face/PutHuggingFaceRequest.ts

+ * After the endpoint is initialized (for dedicated) or ready (for serverless), ensure it supports the OpenAI API and includes `/v1/chat/completions` part in URL. Then, copy the full endpoint URL for use.
+ * Recommended models for `chat_completion` and `completion` tasks:
+ *
+ * * `Mistral-7B-Instruct-v0.2`


@szabosteve should we include the full URL link to these models?

I would rather not include the full URLs. Currently, we can provide only one link per description; otherwise, the generator complains.

…creation comment

github-actions · 2025-06-03T08:16:08Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.inference`	⚪	Missing test	Missing test
`inference.put_alibabacloud`	⚪	Missing test	Missing test
`inference.put_amazonbedrock`	⚪	Missing test	Missing test
`inference.put_anthropic`	⚪	Missing test	Missing test
`inference.put_azureaistudio`	⚪	Missing test	Missing test
`inference.put_azureopenai`	⚪	Missing test	Missing test
`inference.put_cohere`	⚪	Missing test	Missing test
`inference.put_elasticsearch`	⚪	Missing test	Missing test
`inference.put_elser`	⚪	Missing test	Missing test
`inference.put_googleaistudio`	⚪	Missing test	Missing test
`inference.put_googlevertexai`	⚪	Missing test	Missing test
`inference.put_hugging_face`	⚪	Missing test	Missing test
`inference.put_jinaai`	⚪	Missing test	Missing test
`inference.put_mistral`	⚪	Missing test	Missing test
`inference.put_openai`	⚪	Missing test	Missing test
`inference.put_voyageai`	⚪	Missing test	Missing test
`inference.put_watsonx`	⚪	Missing test	Missing test
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

…chat-completion-integration # Conflicts: # output/openapi/elasticsearch-openapi.json # output/openapi/elasticsearch-serverless-openapi.json # output/schema/schema-serverless.json # output/schema/schema.json

…d completion tasks

github-actions · 2025-06-04T17:27:47Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.inference`	⚪	Missing test	Missing test
`inference.put_alibabacloud`	⚪	Missing test	Missing test
`inference.put_amazonbedrock`	⚪	Missing test	Missing test
`inference.put_anthropic`	⚪	Missing test	Missing test
`inference.put_azureaistudio`	⚪	Missing test	Missing test
`inference.put_azureopenai`	⚪	Missing test	Missing test
`inference.put_cohere`	⚪	Missing test	Missing test
`inference.put_elasticsearch`	⚪	Missing test	Missing test
`inference.put_elser`	⚪	Missing test	Missing test
`inference.put_googleaistudio`	⚪	Missing test	Missing test
`inference.put_googlevertexai`	⚪	Missing test	Missing test
`inference.put_hugging_face`	⚪	Missing test	Missing test
`inference.put_jinaai`	⚪	Missing test	Missing test
`inference.put_mistral`	⚪	Missing test	Missing test
`inference.put_openai`	⚪	Missing test	Missing test
`inference.put_voyageai`	⚪	Missing test	Missing test
`inference.put_watsonx`	⚪	Missing test	Missing test
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

….json

github-actions · 2025-06-04T17:33:18Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.inference`	⚪	Missing test	Missing test
`inference.put_alibabacloud`	⚪	Missing test	Missing test
`inference.put_amazonbedrock`	⚪	Missing test	Missing test
`inference.put_anthropic`	⚪	Missing test	Missing test
`inference.put_azureaistudio`	⚪	Missing test	Missing test
`inference.put_azureopenai`	⚪	Missing test	Missing test
`inference.put_cohere`	⚪	Missing test	Missing test
`inference.put_elasticsearch`	⚪	Missing test	Missing test
`inference.put_elser`	⚪	Missing test	Missing test
`inference.put_googleaistudio`	⚪	Missing test	Missing test
`inference.put_googlevertexai`	⚪	Missing test	Missing test
`inference.put_hugging_face`	⚪	Missing test	Missing test
`inference.put_jinaai`	⚪	Missing test	Missing test
`inference.put_mistral`	⚪	Missing test	Missing test
`inference.put_openai`	⚪	Missing test	Missing test
`inference.put_voyageai`	⚪	Missing test	Missing test
`inference.put_watsonx`	⚪	Missing test	Missing test
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

…chat-completion-integration # Conflicts: # output/openapi/elasticsearch-openapi.json # output/openapi/elasticsearch-serverless-openapi.json # output/schema/schema.json

…etion tasks

github-actions · 2025-06-05T09:36:33Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.inference`	⚪	Missing test	Missing test
`inference.put_alibabacloud`	⚪	Missing test	Missing test
`inference.put_amazonbedrock`	⚪	Missing test	Missing test
`inference.put_anthropic`	⚪	Missing test	Missing test
`inference.put_azureaistudio`	⚪	Missing test	Missing test
`inference.put_azureopenai`	⚪	Missing test	Missing test
`inference.put_cohere`	⚪	Missing test	Missing test
`inference.put_elasticsearch`	⚪	Missing test	Missing test
`inference.put_elser`	⚪	Missing test	Missing test
`inference.put_googleaistudio`	⚪	Missing test	Missing test
`inference.put_googlevertexai`	⚪	Missing test	Missing test
`inference.put_hugging_face`	⚪	Missing test	Missing test
`inference.put_jinaai`	⚪	Missing test	Missing test
`inference.put_mistral`	⚪	Missing test	Missing test
`inference.put_openai`	⚪	Missing test	Missing test
`inference.put_voyageai`	⚪	Missing test	Missing test
`inference.put_watsonx`	⚪	Missing test	Missing test
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

Update Inference specification for Hugging Face's completion and chat…

3eebfb0

… completion tasks

Jan-Kazlouski-elastic assigned jonathan-buttner May 19, 2025

Jan-Kazlouski-elastic added ml backport 8.19 backport 9.1 Team:ML labels May 19, 2025

github-actions bot added the specification label May 19, 2025

Jan-Kazlouski-elastic removed backport 9.1 Team:ML labels May 20, 2025

Merge branch 'main' into feature/hugging-face-chat-completion-integra…

3729546

…tion

l-trotta approved these changes May 28, 2025

View reviewed changes

jonathan-buttner requested changes May 28, 2025

View reviewed changes

Jan-Kazlouski-elastic added 2 commits May 29, 2025 12:20

Merge remote-tracking branch 'origin/main' into feature/hugging-face-…

0c992ca

…chat-completion-integration # Conflicts: # output/schema/schema-serverless.json # output/schema/schema.json

Extend description for url parameter

d8df825

Jan-Kazlouski-elastic requested a review from jonathan-buttner May 29, 2025 13:06

l-trotta mentioned this pull request May 30, 2025

Update inference specification for Hugging Face's rerank task #4417

Open

szabosteve approved these changes Jun 2, 2025

View reviewed changes

jonathan-buttner requested changes Jun 2, 2025

View reviewed changes

Fix model_id description for text_embedding task and update endpoint …

bbdad4a

…creation comment

Jan-Kazlouski-elastic added 2 commits June 4, 2025 17:11

Merge remote-tracking branch 'origin/main' into feature/hugging-face-…

091da7c

…chat-completion-integration # Conflicts: # output/openapi/elasticsearch-openapi.json # output/openapi/elasticsearch-serverless-openapi.json # output/schema/schema-serverless.json # output/schema/schema.json

Enhance Hugging Face integration documentation for chat completion an…

16da9b2

…d completion tasks

Add @typescript-eslint/rule-tester to devDependencies in package-lock…

332c5d1

….json

Jan-Kazlouski-elastic requested a review from jonathan-buttner June 5, 2025 08:14

Merge remote-tracking branch 'origin/main' into feature/hugging-face-…

1be69f2

…chat-completion-integration # Conflicts: # output/openapi/elasticsearch-openapi.json # output/openapi/elasticsearch-serverless-openapi.json # output/schema/schema.json

Enhance Hugging Face integration to support chat_completion and compl…

a6dc68f

…etion tasks

	* For `completion` and `chat_completion` tasks, endpoint must be compatible with the OpenAI API format and include `v1/chat/completions`.
	* For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). The URL must include `v1/chat/completions`.

	* Creates an inference endpoint to perform an inference task with the `hugging_face` service.
	* Create an inference endpoint to perform an inference task with the `hugging_face` service.

-   * For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (https://huggingface.co/docs/inference-providers/en/tasks/chat-completion#conversational-large-language-models-llms). OpenAI mode must be enabled, and the endpoint URL must include `v1/chat/completions`.
+   * For `completion` and `chat_completion` tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (see the linked external documentation for details). The endpoint URL for the request must include `/v1/chat/completions`.
+  * If the model supports the OpenAI Chat Completion schema, a toggle should appear in the interface. Enabling this toggle doesn't change any model behavior, it reveals the full endpoint URL needed (which should include `/v1/chat/completions`) when configuring the inference endpoint in Elasticsearch. If the model doesn't support this schema, the toggle may not be shown.
+  * @ext_doc_id huggingface-chat-completion-interface

Update Inference specification for Hugging Face's completion and chat completion tasks #4383

Are you sure you want to change the base?

Update Inference specification for Hugging Face's completion and chat completion tasks #4383

Conversation

Jan-Kazlouski-elastic commented May 19, 2025

Uh oh!

github-actions bot commented May 19, 2025

Uh oh!

github-actions bot commented May 26, 2025

Uh oh!

l-trotta left a comment

Choose a reason for hiding this comment

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szabosteve Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 29, 2025

Uh oh!

szabosteve left a comment

Choose a reason for hiding this comment

Uh oh!

szabosteve Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jan-Kazlouski-elastic Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

szabosteve Jun 2, 2025 •

edited

Loading

szabosteve Jun 2, 2025 •

edited

Loading

Jan-Kazlouski-elastic Jun 3, 2025 •

edited

Loading

Jan-Kazlouski-elastic Jun 3, 2025 •

edited

Loading

szabosteve Jun 4, 2025 •

edited

Loading