You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Currently the logic tracks the models by Spec.ModelName, since this is not guaranteed to be unique within the cluster, we could run into two issues:
1) If the model name changes on the same InferenceModel object, we don't delete the original model entry in the datastore. 2) We don't enforce the semantics that the modelName with the oldest creation timestamp is retained. While the api is assuming that this is enforced by another controller via the Ready condition, we don't have this controller yet, and so currently the behavior is unpredictable depending on InferenceModel events order.
To address the above, the PR makes changes to both the InferenceModel reconciler and the Model APIs in the datastore to ensure thread safe updates of the entries. In the store, the sync.Map was replaced with two maps to track the InferenceModel entries by both ModelName and InferenceModel object NamespacedName. This is needed to properly handle deletions when the object doesn't exist anymore (could be handled in other ways, but this seemed like a reasonable approach).
The PR increases the datastore pkg unit test coverage the Pool and Model APIs. We still need to followup with adding unit test coverage to the pods APIs, which is currently non-existent.
* Convert unit test to a table
* remove the dual map for the models store, and rely on linear search when looking up the model by object name
* Added ModelResync to handle a race condition
* Update pkg/epp/controller/inferencemodel_reconciler.go
0 commit comments