Skip to content

Use dedicated namespace for e2e test code #661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 8, 2025

Conversation

rramkumar1
Copy link
Contributor

@rramkumar1 rramkumar1 commented Apr 8, 2025

Fixed #227

/assign @danehans

See results of E2E run below using the CPU-based vLLM:

rramkumar@rramkumar:~/gateway-api-inference-extension$ export E2E_MANIFEST_PATH=config/manifests/vllm/cpu-deployment.yaml
rramkumar@rramkumar:~/gateway-api-inference-extension$ export E2E_NS=foobar
rramkumar@rramkumar:~/gateway-api-inference-extension$ make test-e2e
MANIFEST_PATH=/usr/local/google/home/rramkumar/gateway-api-inference-extension/config/manifests/vllm/cpu-deployment.yaml go test ./test/e2e/epp/ -v -ginkgo.v
=== RUN   TestAPIs
Running Suite: End To End Test Suite - /usr/local/google/home/rramkumar/gateway-api-inference-extension/test/e2e/epp
====================================================================================================================
Random Seed: 1744127212

Will run 1 of 1 specs
------------------------------
[BeforeSuite] 
/usr/local/google/home/rramkumar/gateway-api-inference-extension/test/e2e/epp/e2e_suite_test.go:104
  STEP: Setting up the test suite @ 04/08/25 15:46:52.744
  STEP: Creating test infrastructure @ 04/08/25 15:46:52.746
  STEP: Creating e2e namespace: foobar @ 04/08/25 15:46:52.746
  STEP: Ensuring MANIFEST_PATH environment variable is set @ 04/08/25 15:46:53.222
  STEP: Ensuring the model server manifest points to an existing file @ 04/08/25 15:46:53.222
  STEP: Reading YAML file: /usr/local/google/home/rramkumar/gateway-api-inference-extension/config/manifests/vllm/cpu-deployment.yaml @ 04/08/25 15:46:53.222
  STEP: Creating CRD resource from manifest: ../../../config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml @ 04/08/25 15:46:53.223
  STEP: Reading YAML file: ../../../config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml @ 04/08/25 15:46:53.223
  STEP: Decoded GVK: apiextensions.k8s.io/v1, Kind=CustomResourceDefinition @ 04/08/25 15:46:53.224
  STEP: Checking CRD inferencemodels.inference.networking.x-k8s.io status is: Established @ 04/08/25 15:46:53.468
  STEP: Creating CRD resource from manifest: ../../../config/crd/bases/inference.networking.x-k8s.io_inferencepools.yaml @ 04/08/25 15:46:53.545
  STEP: Reading YAML file: ../../../config/crd/bases/inference.networking.x-k8s.io_inferencepools.yaml @ 04/08/25 15:46:53.545
  STEP: Decoded GVK: apiextensions.k8s.io/v1, Kind=CustomResourceDefinition @ 04/08/25 15:46:53.547
  STEP: Checking CRD inferencepools.inference.networking.x-k8s.io status is: Established @ 04/08/25 15:46:53.758
  STEP: Reading YAML file: ../../testdata/inferencepool-e2e.yaml @ 04/08/25 15:46:53.833
  STEP: Replacing placeholder namespace with E2E_NS environment variable @ 04/08/25 15:46:53.833
  STEP: Creating inference extension resources from manifest: ../../testdata/inferencepool-e2e.yaml @ 04/08/25 15:46:53.833
  STEP: Decoded GVK: inference.networking.x-k8s.io/v1alpha2, Kind=InferencePool @ 04/08/25 15:46:53.834
  STEP: Decoded GVK: /v1, Kind=Service @ 04/08/25 15:46:56.021
  STEP: Decoded GVK: apps/v1, Kind=Deployment @ 04/08/25 15:46:56.139
  STEP: Decoded GVK: rbac.authorization.k8s.io/v1, Kind=ClusterRole @ 04/08/25 15:46:56.249
  STEP: Decoded GVK: rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding @ 04/08/25 15:46:56.374
  STEP: Checking if deployment foobar/vllm-llama3-8b-instruct-epp status is: Available @ 04/08/25 15:46:56.825
  STEP: Creating client resources from manifest: ../../testdata/client.yaml @ 04/08/25 15:47:07.174
  STEP: Reading YAML file: ../../testdata/client.yaml @ 04/08/25 15:47:07.174
  STEP: Decoded GVK: /v1, Kind=Pod @ 04/08/25 15:47:07.175
  STEP: Checking pod foobar/curl status is: Ready @ 04/08/25 15:47:07.389
  STEP: Reading YAML file: ../../testdata/envoy.yaml @ 04/08/25 15:47:08.425
  STEP: Replacing placeholder namespace with E2E_NS environment variable @ 04/08/25 15:47:08.425
  STEP: Creating envoy proxy resources from manifest: ../../testdata/envoy.yaml @ 04/08/25 15:47:08.425
  STEP: Decoded GVK: /v1, Kind=ConfigMap @ 04/08/25 15:47:08.426
  STEP: Decoded GVK: apps/v1, Kind=Deployment @ 04/08/25 15:47:08.523
  STEP: Decoded GVK: /v1, Kind=Service @ 04/08/25 15:47:08.611
  STEP: Checking if deployment foobar/envoy status is: Available @ 04/08/25 15:47:08.96
  STEP: Creating model server resources from manifest: /usr/local/google/home/rramkumar/gateway-api-inference-extension/config/manifests/vllm/cpu-deployment.yaml @ 04/08/25 15:47:10.74
  STEP: Decoded GVK: apps/v1, Kind=Deployment @ 04/08/25 15:47:10.741
  STEP: Decoded GVK: /v1, Kind=ConfigMap @ 04/08/25 15:47:10.831
  STEP: Checking if deployment foobar/vllm-llama3-8b-instruct status is: Available @ 04/08/25 15:47:11.044
[BeforeSuite] PASSED [74.897 seconds]
------------------------------
InferencePool when The Inference Extension is running Should route traffic to target model servers
/usr/local/google/home/rramkumar/gateway-api-inference-extension/test/e2e/epp/e2e_test.go:47
  STEP: Waiting for the namespace to exist. @ 04/08/25 15:48:07.641
  STEP: Ensuring namespace exists: foobar @ 04/08/25 15:48:07.641
  STEP: Creating an InferenceModel resource @ 04/08/25 15:48:07.734
  STEP: Ensuring the InferenceModel resource exists in the namespace @ 04/08/25 15:48:07.845
  STEP: Verifying connectivity through the inference extension @ 04/08/25 15:48:07.944
  STEP: Deleting the InferenceModel test resource. @ 04/08/25 15:49:48.896
• [101.501 seconds]
------------------------------
[AfterSuite] 
/usr/local/google/home/rramkumar/gateway-api-inference-extension/test/e2e/epp/e2e_suite_test.go:138
  STEP: Performing global cleanup @ 04/08/25 15:49:49.142
[AfterSuite] PASSED [2.185 seconds]
------------------------------

Ran 1 of 1 Specs in 178.584 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestAPIs (178.58s)
PASS
ok  	sigs.k8s.io/gateway-api-inference-extension/test/e2e/epp	178.659s

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 8, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 8, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @rramkumar1. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 8, 2025
Copy link

netlify bot commented Apr 8, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit b1efa3b
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67f561e416f9440008efbbd7
😎 Deploy Preview https://deploy-preview-661--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@ahg-g
Copy link
Contributor

ahg-g commented Apr 8, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 8, 2025
@@ -0,0 +1,126 @@
apiVersion: inference.networking.x-k8s.io/v1alpha2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a duplication of the existing inferencepool-resources.yaml, excluding the namespace field.
I would prefer if we can remove the namespace field from all resources in the inferencepool-resources.yaml file and specify the namespace in the e2e test (as env var as specified in other comment).

there is a big value in testing e2e our "Getting Started" files, so we are self testing the public documentation.
I would avoid having duplications as much as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you suggest getting the env var parsed in the YAML file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I could use a similar approach to the one you described above. However, that might complicate the getting started guide? How would the user of the guide make sure the env var was parsed in the file?

Copy link
Contributor

@nirrozenbaum nirrozenbaum Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the e2e a similar approach would work. in getting started guide I’m not sure we have an agreement of the other maintainers about being able to choose the namespace (I’m in favor of it).
if there is an agreement, you can use envsubst command to replace env var with it’s value.

for example -
envsubst < myfile.yaml | kubectl apply -f -

this discussion is relevant only for the subject serviceaccount in the ClusterRoleBinding, as the namespace in the Service and Deployment resources can be just removed and then you can decide the ns during e2e test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we tackle this in a follow-up then? If we change the getting started file in this PR its going to expand the scope considerably as we now need to update the guide with the envvar substitution command.

cc @danehans

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change the getting started file in this PR

+1 on not changing the getting started guide in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a big value in testing e2e our "Getting Started" files, so we are self testing the public documentation.

I agree with this point. We may need to consider 2 different e2e tests, 1 for testing the quickstart and another that is more programmatic to support various inference use cases. Maybe we track this in a separate issue?

cc: @ahg-g @kfswain

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to move away from yaml files and just create those objects in code,,, the envoy one is complex, so I don't mind it, but other than that I think it is more manageable to have the test create those objects in code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, well, creating the pool requires also creating all the epp and the rbac resources, so it is also complex...

@danehans
Copy link
Contributor

danehans commented Apr 8, 2025

@rramkumar1 CI is failing due to:

test/e2e/epp/e2e_suite_test.go:59:98: `overriden` is a misspelling of `overridden` (misspell)
	// defaultNsName is the default name of the Namespace used for tests. The namespace used can be overriden by the E2E_NS environment variable.
	                                                                                                ^
make: *** [Makefile:144: ci-lint] Error 1

@rramkumar1
Copy link
Contributor Author

@rramkumar1 CI is failing due to:

test/e2e/epp/e2e_suite_test.go:59:98: `overriden` is a misspelling of `overridden` (misspell)
	// defaultNsName is the default name of the Namespace used for tests. The namespace used can be overriden by the E2E_NS environment variable.
	                                                                                                ^
make: *** [Makefile:144: ci-lint] Error 1

Fixed.

@@ -0,0 +1,126 @@
apiVersion: inference.networking.x-k8s.io/v1alpha2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a big value in testing e2e our "Getting Started" files, so we are self testing the public documentation.

I agree with this point. We may need to consider 2 different e2e tests, 1 for testing the quickstart and another that is more programmatic to support various inference use cases. Maybe we track this in a separate issue?

cc: @ahg-g @kfswain

@@ -109,6 +115,8 @@ var _ = ginkgo.BeforeSuite(func() {
})

func setupInfra() {
createNamespace(cli, nsName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the namespace deletion to AfterSuite cleanup?
all resources that are created in the test should be deleted at the end of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nirrozenbaum
Copy link
Contributor

/lgtm

Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 8, 2025
@danehans
Copy link
Contributor

danehans commented Apr 8, 2025

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danehans, rramkumar1

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 8, 2025
@k8s-ci-robot k8s-ci-robot merged commit ae3df87 into kubernetes-sigs:main Apr 8, 2025
7 of 8 checks passed
kfswain pushed a commit to kfswain/llm-instance-gateway that referenced this pull request Apr 29, 2025
kfswain pushed a commit to kfswain/llm-instance-gateway that referenced this pull request Apr 29, 2025
kfswain pushed a commit to kfswain/llm-instance-gateway that referenced this pull request Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use Dedicated Namespace for Resources
5 participants