Skip to content

Commit b8f5479

Browse files
committed
generate new manifests
1 parent 22f16f8 commit b8f5479

File tree

1 file changed

+11
-5
lines changed

1 file changed

+11
-5
lines changed

config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,12 @@ spec:
2929
openAPIV3Schema:
3030
description: |-
3131
InferenceModel is the Schema for the InferenceModels API.
32-
The InferenceModel is intended to represent a model workload within Kubernetes.
32+
The InferenceModel is intended to represent a model workload (also referred to as a model use case) within Kubernetes.
33+
The management of the model server is not done by the InferenceModel. Instead, the
34+
focus of the InferenceModel is to provide the tools needed to effectively manage multiple models
35+
that share the same base model (currently the focus is LoRA adapters). Fields such as TargetModel
36+
are intended to simplify A/B testing and version rollout of adapters. While Criticality assists with
37+
governance of multiplexing many usecases over shared hardware.
3338
properties:
3439
apiVersion:
3540
description: |-
@@ -50,15 +55,16 @@ spec:
5055
type: object
5156
spec:
5257
description: |-
53-
InferenceModelSpec represents the desired state of a specific model use case. This resource is
58+
InferenceModelSpec represents the desired state of an InferenceModel. This resource is
5459
managed by the "Inference Workload Owner" persona.
5560
5661
The Inference Workload Owner persona is someone that trains, verifies, and
57-
leverages a large language model from a model frontend, drives the lifecycle
58-
and rollout of new versions of those models, and defines the specific
62+
leverages a large language model focusing on model fidelity performance, and
63+
less on inference performance (which is managed by the Inference Platform Admin).
64+
They also drive the lifecycle and rollout of new versions of those models, and defines the specific
5965
performance and latency goals for the model. These workloads are
6066
expected to operate within an InferencePool sharing compute capacity with other
61-
InferenceModels, defined by the Inference Platform Admin.
67+
InferenceModels, with specific governance defined by the Inference Platform Admin.
6268
properties:
6369
criticality:
6470
description: |-

0 commit comments

Comments
 (0)