generate new manifests

kfswain · kfswain · commit b8f5479f4c58 · 2025-01-23T00:38:14.000Z
diff --git a/config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml b/config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml
@@ -29,7 +29,12 @@ spec:
       openAPIV3Schema:
         description: |-
           InferenceModel is the Schema for the InferenceModels API.
-          The InferenceModel is intended to represent a model workload within Kubernetes.
+          The InferenceModel is intended to represent a model workload (also referred to as a model use case) within Kubernetes.
+          The management of the model server is not done by the InferenceModel. Instead, the
+          focus of the InferenceModel is to provide the tools needed to effectively manage multiple models
+          that share the same base model (currently the focus is LoRA adapters). Fields such as TargetModel
+          are intended to simplify A/B testing and version rollout of adapters. While Criticality assists with
+          governance of multiplexing many usecases over shared hardware.
         properties:
           apiVersion:
             description: |-
@@ -50,15 +55,16 @@ spec:
             type: object
           spec:
             description: |-
-              InferenceModelSpec represents the desired state of a specific model use case. This resource is
+              InferenceModelSpec represents the desired state of an InferenceModel. This resource is
               managed by the "Inference Workload Owner" persona.
 
               The Inference Workload Owner persona is someone that trains, verifies, and
-              leverages a large language model from a model frontend, drives the lifecycle
-              and rollout of new versions of those models, and defines the specific
+              leverages a large language model focusing on model fidelity performance, and
+              less on inference performance (which is managed by the Inference Platform Admin).
+              They also drive the lifecycle and rollout of new versions of those models, and defines the specific
               performance and latency goals for the model. These workloads are
               expected to operate within an InferencePool sharing compute capacity with other
-              InferenceModels, defined by the Inference Platform Admin.
+              InferenceModels, with specific governance defined by the Inference Platform Admin.
             properties:
               criticality:
                 description: |-