Use dedicated namespace for e2e test code #661

rramkumar1 · 2025-04-08T02:11:52Z

Fixed #227

See results of E2E run below using the CPU-based vLLM:

rramkumar@rramkumar:~/gateway-api-inference-extension$ export E2E_MANIFEST_PATH=config/manifests/vllm/cpu-deployment.yaml
rramkumar@rramkumar:~/gateway-api-inference-extension$ export E2E_NS=foobar
rramkumar@rramkumar:~/gateway-api-inference-extension$ make test-e2e
MANIFEST_PATH=/usr/local/google/home/rramkumar/gateway-api-inference-extension/config/manifests/vllm/cpu-deployment.yaml go test ./test/e2e/epp/ -v -ginkgo.v
=== RUN   TestAPIs
Running Suite: End To End Test Suite - /usr/local/google/home/rramkumar/gateway-api-inference-extension/test/e2e/epp
====================================================================================================================
Random Seed: 1744127212

Will run 1 of 1 specs
------------------------------
[BeforeSuite] 
/usr/local/google/home/rramkumar/gateway-api-inference-extension/test/e2e/epp/e2e_suite_test.go:104
  STEP: Setting up the test suite @ 04/08/25 15:46:52.744
  STEP: Creating test infrastructure @ 04/08/25 15:46:52.746
  STEP: Creating e2e namespace: foobar @ 04/08/25 15:46:52.746
  STEP: Ensuring MANIFEST_PATH environment variable is set @ 04/08/25 15:46:53.222
  STEP: Ensuring the model server manifest points to an existing file @ 04/08/25 15:46:53.222
  STEP: Reading YAML file: /usr/local/google/home/rramkumar/gateway-api-inference-extension/config/manifests/vllm/cpu-deployment.yaml @ 04/08/25 15:46:53.222
  STEP: Creating CRD resource from manifest: ../../../config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml @ 04/08/25 15:46:53.223
  STEP: Reading YAML file: ../../../config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml @ 04/08/25 15:46:53.223
  STEP: Decoded GVK: apiextensions.k8s.io/v1, Kind=CustomResourceDefinition @ 04/08/25 15:46:53.224
  STEP: Checking CRD inferencemodels.inference.networking.x-k8s.io status is: Established @ 04/08/25 15:46:53.468
  STEP: Creating CRD resource from manifest: ../../../config/crd/bases/inference.networking.x-k8s.io_inferencepools.yaml @ 04/08/25 15:46:53.545
  STEP: Reading YAML file: ../../../config/crd/bases/inference.networking.x-k8s.io_inferencepools.yaml @ 04/08/25 15:46:53.545
  STEP: Decoded GVK: apiextensions.k8s.io/v1, Kind=CustomResourceDefinition @ 04/08/25 15:46:53.547
  STEP: Checking CRD inferencepools.inference.networking.x-k8s.io status is: Established @ 04/08/25 15:46:53.758
  STEP: Reading YAML file: ../../testdata/inferencepool-e2e.yaml @ 04/08/25 15:46:53.833
  STEP: Replacing placeholder namespace with E2E_NS environment variable @ 04/08/25 15:46:53.833
  STEP: Creating inference extension resources from manifest: ../../testdata/inferencepool-e2e.yaml @ 04/08/25 15:46:53.833
  STEP: Decoded GVK: inference.networking.x-k8s.io/v1alpha2, Kind=InferencePool @ 04/08/25 15:46:53.834
  STEP: Decoded GVK: /v1, Kind=Service @ 04/08/25 15:46:56.021
  STEP: Decoded GVK: apps/v1, Kind=Deployment @ 04/08/25 15:46:56.139
  STEP: Decoded GVK: rbac.authorization.k8s.io/v1, Kind=ClusterRole @ 04/08/25 15:46:56.249
  STEP: Decoded GVK: rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding @ 04/08/25 15:46:56.374
  STEP: Checking if deployment foobar/vllm-llama3-8b-instruct-epp status is: Available @ 04/08/25 15:46:56.825
  STEP: Creating client resources from manifest: ../../testdata/client.yaml @ 04/08/25 15:47:07.174
  STEP: Reading YAML file: ../../testdata/client.yaml @ 04/08/25 15:47:07.174
  STEP: Decoded GVK: /v1, Kind=Pod @ 04/08/25 15:47:07.175
  STEP: Checking pod foobar/curl status is: Ready @ 04/08/25 15:47:07.389
  STEP: Reading YAML file: ../../testdata/envoy.yaml @ 04/08/25 15:47:08.425
  STEP: Replacing placeholder namespace with E2E_NS environment variable @ 04/08/25 15:47:08.425
  STEP: Creating envoy proxy resources from manifest: ../../testdata/envoy.yaml @ 04/08/25 15:47:08.425
  STEP: Decoded GVK: /v1, Kind=ConfigMap @ 04/08/25 15:47:08.426
  STEP: Decoded GVK: apps/v1, Kind=Deployment @ 04/08/25 15:47:08.523
  STEP: Decoded GVK: /v1, Kind=Service @ 04/08/25 15:47:08.611
  STEP: Checking if deployment foobar/envoy status is: Available @ 04/08/25 15:47:08.96
  STEP: Creating model server resources from manifest: /usr/local/google/home/rramkumar/gateway-api-inference-extension/config/manifests/vllm/cpu-deployment.yaml @ 04/08/25 15:47:10.74
  STEP: Decoded GVK: apps/v1, Kind=Deployment @ 04/08/25 15:47:10.741
  STEP: Decoded GVK: /v1, Kind=ConfigMap @ 04/08/25 15:47:10.831
  STEP: Checking if deployment foobar/vllm-llama3-8b-instruct status is: Available @ 04/08/25 15:47:11.044
[BeforeSuite] PASSED [74.897 seconds]
------------------------------
InferencePool when The Inference Extension is running Should route traffic to target model servers
/usr/local/google/home/rramkumar/gateway-api-inference-extension/test/e2e/epp/e2e_test.go:47
  STEP: Waiting for the namespace to exist. @ 04/08/25 15:48:07.641
  STEP: Ensuring namespace exists: foobar @ 04/08/25 15:48:07.641
  STEP: Creating an InferenceModel resource @ 04/08/25 15:48:07.734
  STEP: Ensuring the InferenceModel resource exists in the namespace @ 04/08/25 15:48:07.845
  STEP: Verifying connectivity through the inference extension @ 04/08/25 15:48:07.944
  STEP: Deleting the InferenceModel test resource. @ 04/08/25 15:49:48.896
• [101.501 seconds]
------------------------------
[AfterSuite] 
/usr/local/google/home/rramkumar/gateway-api-inference-extension/test/e2e/epp/e2e_suite_test.go:138
  STEP: Performing global cleanup @ 04/08/25 15:49:49.142
[AfterSuite] PASSED [2.185 seconds]
------------------------------

Ran 1 of 1 Specs in 178.584 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestAPIs (178.58s)
PASS
ok  	sigs.k8s.io/gateway-api-inference-extension/test/e2e/epp	178.659s

k8s-ci-robot · 2025-04-08T02:12:02Z

Hi @rramkumar1. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

netlify · 2025-04-08T02:12:34Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`b1efa3b`
🔍 Latest deploy log	https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67f561e416f9440008efbbd7
😎 Deploy Preview	https://deploy-preview-661--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

ahg-g · 2025-04-08T02:20:25Z

/ok-to-test

config/manifests/inferencepool-resources.yaml

test/testdata/envoy.yaml

nirrozenbaum · 2025-04-08T09:40:56Z

test/testdata/inferencepool-e2e.yaml

@@ -0,0 +1,126 @@
+apiVersion: inference.networking.x-k8s.io/v1alpha2


This looks like a duplication of the existing inferencepool-resources.yaml, excluding the namespace field.
I would prefer if we can remove the namespace field from all resources in the inferencepool-resources.yaml file and specify the namespace in the e2e test (as env var as specified in other comment).

there is a big value in testing e2e our "Getting Started" files, so we are self testing the public documentation.
I would avoid having duplications as much as possible.

How would you suggest getting the env var parsed in the YAML file?

I suppose I could use a similar approach to the one you described above. However, that might complicate the getting started guide? How would the user of the guide make sure the env var was parsed in the file?

in the e2e a similar approach would work. in getting started guide I’m not sure we have an agreement of the other maintainers about being able to choose the namespace (I’m in favor of it).
if there is an agreement, you can use envsubst command to replace env var with it’s value.

for example -
envsubst < myfile.yaml | kubectl apply -f -

this discussion is relevant only for the subject serviceaccount in the ClusterRoleBinding, as the namespace in the Service and Deployment resources can be just removed and then you can decide the ns during e2e test.

Should we tackle this in a follow-up then? If we change the getting started file in this PR its going to expand the scope considerably as we now need to update the guide with the envvar substitution command.

cc @danehans

If we change the getting started file in this PR

+1 on not changing the getting started guide in this PR.

there is a big value in testing e2e our "Getting Started" files, so we are self testing the public documentation.

I agree with this point. We may need to consider 2 different e2e tests, 1 for testing the quickstart and another that is more programmatic to support various inference use cases. Maybe we track this in a separate issue?

cc: @ahg-g @kfswain

I prefer to move away from yaml files and just create those objects in code,,, the envoy one is complex, so I don't mind it, but other than that I think it is more manageable to have the test create those objects in code.

oh, well, creating the pool requires also creating all the epp and the rbac resources, so it is also complex...

danehans · 2025-04-08T17:02:45Z

@rramkumar1 CI is failing due to:

test/e2e/epp/e2e_suite_test.go:59:98: `overriden` is a misspelling of `overridden` (misspell)
	// defaultNsName is the default name of the Namespace used for tests. The namespace used can be overriden by the E2E_NS environment variable.
	                                                                                                ^
make: *** [Makefile:144: ci-lint] Error 1

rramkumar1 · 2025-04-08T17:05:02Z

@rramkumar1 CI is failing due to:

test/e2e/epp/e2e_suite_test.go:59:98: `overriden` is a misspelling of `overridden` (misspell)
	// defaultNsName is the default name of the Namespace used for tests. The namespace used can be overriden by the E2E_NS environment variable.
	                                                                                                ^
make: *** [Makefile:144: ci-lint] Error 1

Fixed.

test/testdata/envoy.yaml

test/e2e/epp/e2e_suite_test.go

danehans · 2025-04-08T17:16:13Z

test/testdata/inferencepool-e2e.yaml

@@ -0,0 +1,126 @@
+apiVersion: inference.networking.x-k8s.io/v1alpha2


there is a big value in testing e2e our "Getting Started" files, so we are self testing the public documentation.

I agree with this point. We may need to consider 2 different e2e tests, 1 for testing the quickstart and another that is more programmatic to support various inference use cases. Maybe we track this in a separate issue?

cc: @ahg-g @kfswain

nirrozenbaum · 2025-04-08T18:22:01Z

test/e2e/epp/e2e_suite_test.go

@@ -109,6 +115,8 @@ var _ = ginkgo.BeforeSuite(func() {
 })

 func setupInfra() {
+	createNamespace(cli, nsName)


can you add the namespace deletion to AfterSuite cleanup?
all resources that are created in the test should be deleted at the end of it.

I believe it's already done here:

https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/test/e2e/epp/e2e_suite_test.go#L160

I do observe the NS being cleaned up.

nirrozenbaum · 2025-04-08T19:07:35Z

/lgtm

Thanks!

danehans · 2025-04-08T22:05:46Z

/approve

k8s-ci-robot · 2025-04-08T22:05:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danehans, rramkumar1

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [danehans]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot assigned danehans Apr 8, 2025

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 8, 2025

k8s-ci-robot requested review from ahg-g and nirrozenbaum April 8, 2025 02:11

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 8, 2025

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 8, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 8, 2025