Add logging guidelines

liu-cong · liu-cong · commit ab133a1622de · 2025-01-09T15:26:08.000-08:00
diff --git a/README.md b/README.md
@@ -34,7 +34,7 @@ Our community meeting is weekly at Th 10AM PDT; [zoom link here](https://zoom.us
 
 We currently utilize the [#wg-serving](https://kubernetes.slack.com/?redir=%2Fmessages%2Fwg-serving) slack channel for communications.
 
-Contributions are readily welcomed, thanks for joining us!
+Contributions are readily welcomed, follow the [dev guide](./docs/dev.md) to start contributing!
 
 ### Code of conduct
 
diff --git a/docs/dev.md b/docs/dev.md
@@ -0,0 +1,45 @@
+<!-- Dev guide -->
+
+
+## Logging
+
+### Change log verbosity
+We use the `k8s.io/klog/v2` package to manage logging. 
+
+We generally follow the [k8s instrumentation logging guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md), which states "the practical default level is V(2). Developers and QE environments may wish to run at V(3) or V(4)".
+
+To configure logging verbosity, specify the `v` flag such as  `--v=2`.
+
+### Add logs
+We adapt the [k8s instrumentation logging guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md) based on our code base. 
+
+1. The server startup process. Logging at the default verbosity level is generally welcome here as this is only logged once at startup, and provides useful info for debugging.
+   * `klog.V(0).InfoS` = `klog.InfoS`  Default, log things such as startup flags
+
+2. Reconciler loops. The reconciler loops watches for CR changes such as the `InferenceModel` CR. And given changes in these CRs significantly affect the behavior of the extension, we recommend using v=1 verbosity level as default, and sparsely use higher verbosity levels.
+   
+   * `klog.V(1).InfoS`
+      * Default log level in the reconcilers.
+      * Information about config (listening on X, watching Y)
+      * Errors that repeat frequently that relate to conditions that can be corrected (e.g., inference model not initialized yet)
+   * `klog.V(2).InfoS`
+      * System state changing (adding/removing objects in the data store)
+   * `V(3)` and above: Use your best judgement. 
+
+3. Inference request handling. These requests are expected to be much higher volume than the control flow in the reconcilers and therefore we should be mindful of log spamming. We recommend using v=2 to log important info about a request, such as the HTTP response code, and higher verbosity levels for less important info.
+
+   * `klog.V(2).InfoS`
+      * Logging HTTP requests and their exit code
+      * Important decision making such as picking the target model, target pod
+   * `klog.V(3).InfoS`
+      * Detailed request scheduling algorithm operations, such as running the filtering logic
+   * `V(4)` and above: Use your best judgement. 
+
+4. Metric scraping loops. These loops run at a very high frequency, and logs can be very spammy if not handled properly.
+    * `klog.V(4).InfoS`
+      * Transient errors/warnings, such as failure to get response from a pod.
+    * `klog.V(5).InfoS` - Default
+
+5. Misc 
+   * `klog.V(3).InfoS`
+      * A periodically (every 5s) printed debug message with the current pods and metrics. This is very important to debug the request scheduling algorithm, and yet not spammy compared to the metric scraping loop logs.