Add text-to-video to supported tasks (#2790)

Wauplin · hanouticelina · web-flow · commit 1d0999d3dec0 · 2025-01-27T18:05:55.000+01:00
* Add text-to-video to supported tasks

* replicate text-to-speech

* complete text-to-speech examples

---------

Co-authored-by: Celina Hanouti &lt;hanouticelina@gmail.com&gt;
diff --git a/docs/source/en/guides/inference.md b/docs/source/en/guides/inference.md
@@ -252,13 +252,14 @@ You might wonder why using [`InferenceClient`] instead of OpenAI's client? There
 | **Audio**           | [`~InferenceClient.audio_classification`]           | ✅            | ❌         | ❌      | ❌         | ❌        |
 |                     | [`~InferenceClient.audio_to_audio`]                 | ✅            | ❌         | ❌      | ❌         | ❌        |
 |                     | [`~InferenceClient.automatic_speech_recognition`]   | ✅            | ❌         | ✅      | ❌         | ❌        |
-|                     | [`~InferenceClient.text_to_speech`]                 | ✅            | ❌         | ❌      | ❌         | ❌        |
+|                     | [`~InferenceClient.text_to_speech`]                 | ✅            | ✅         | ❌      | ❌         | ❌        |
 | **Computer Vision** | [`~InferenceClient.image_classification`]           | ✅            | ❌         | ❌      | ❌         | ❌        |
 |                     | [`~InferenceClient.image_segmentation`]             | ✅            | ❌         | ❌      | ❌         | ❌        |
 |                     | [`~InferenceClient.image_to_image`]                 | ✅            | ❌         | ❌      | ❌         | ❌        |
 |                     | [`~InferenceClient.image_to_text`]                  | ✅            | ❌         | ❌      | ❌         | ❌        |
 |                     | [`~InferenceClient.object_detection`]               | ✅            | ❌         | ❌      | ❌         | ❌        |
 |                     | [`~InferenceClient.text_to_image`]                  | ✅            | ✅         | ✅      | ❌         | ✅        |
+|                     | [`~InferenceClient.text_to_video`]                  | ❌            | ✅         | ✅      | ❌         | ❌        |
 |                     | [`~InferenceClient.zero_shot_image_classification`] | ✅            | ❌         | ❌      | ❌         | ❌        |
 | **Multimodal**      | [`~InferenceClient.document_question_answering`]    | ✅            | ❌         | ❌      | ❌         | ❌        |
 |                     | [`~InferenceClient.visual_question_answering`]      | ✅            | ❌         | ❌      | ❌         | ❌        |
diff --git a/src/huggingface_hub/inference/_client.py b/src/huggingface_hub/inference/_client.py
@@ -2714,6 +2714,7 @@ def text_to_speech(
         ...     text="Hello world",
         ...     model="OuteAI/OuteTTS-0.3-500M",
         ... )
+        >>> Path("hello_world.flac").write_bytes(audio)
         ```
 
         Example using a third-party provider through Hugging Face Routing. Usage will be billed on your Hugging Face account.
@@ -2727,6 +2728,7 @@ def text_to_speech(
         ...     text="Hello world",
         ...     model="OuteAI/OuteTTS-0.3-500M",
         ... )
+        >>> Path("hello_world.flac").write_bytes(audio)
         ```
         """
         provider_helper = get_provider_helper(self.provider, task="text-to-speech")
diff --git a/src/huggingface_hub/inference/_generated/_async_client.py b/src/huggingface_hub/inference/_generated/_async_client.py
@@ -2775,6 +2775,7 @@ async def text_to_speech(
         ...     text="Hello world",
         ...     model="OuteAI/OuteTTS-0.3-500M",
         ... )
+        >>> Path("hello_world.flac").write_bytes(audio)
         ```
 
         Example using a third-party provider through Hugging Face Routing. Usage will be billed on your Hugging Face account.
@@ -2788,6 +2789,7 @@ async def text_to_speech(
         ...     text="Hello world",
         ...     model="OuteAI/OuteTTS-0.3-500M",
         ... )
+        >>> Path("hello_world.flac").write_bytes(audio)
         ```
         """
         provider_helper = get_provider_helper(self.provider, task="text-to-speech")