Skip to content

Create unmanaged kubernetes cluster on CloudStack #250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

vishesh92
Copy link
Member

@vishesh92 vishesh92 commented May 11, 2023

Issue #, if available:
Description of changes:
Creates an unmanaged kubernetes cluster on Cloudstack. This provides a nice view of CAPC managed kubernetes cluster in CloudStack UI.

Snapshots from ACS Dashboard:
Screenshot from 2023-08-22 17-24-21
Screenshot from 2023-08-22 17-24-30

Depends on: apache/cloudstack#7515 apache/cloudstack-go#59 (go.mod needs to be updated once this is merged)
Testing performed:
Tested locally by creating a kubernetes cluster using cluster api.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 11, 2023
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 11, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @vishesh92. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@netlify
Copy link

netlify bot commented May 11, 2023

Deploy Preview for kubernetes-sigs-cluster-api-cloudstack ready!

Name Link
🔨 Latest commit 254d93b
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-cloudstack/deploys/64e8415959d0bb000815e215
😎 Deploy Preview https://deploy-preview-250--kubernetes-sigs-cluster-api-cloudstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 11, 2023
@vishesh92 vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch from 8a2fcf4 to 7b3e85e Compare May 11, 2023 08:53
@chrisdoherty4
Copy link
Member

Hi @vishesh92. I took a look at some of the other changes in CloudStack that you referenced and I'm struggling to understand what we're trying to achieve and why. Would you mind elaborating?

@vishesh92
Copy link
Member Author

vishesh92 commented May 11, 2023

Hi @chrisdoherty4
This creates an entry for a kubernetes cluster in CloudStack. This helps provide a good view of CAPC managed clusters on CloudStack and the VMs associated with it. This PR is dependent on apache/cloudstack-go PR merge and release. So, it will be some time before this PR can be merged.

@chrisdoherty4
Copy link
Member

chrisdoherty4 commented May 11, 2023

@vishesh92 So populating this value would bind VMs to an Apache CloudStack Cluster ('cluster' is hyperloaded in this context)?

Am I right in saying we expect users to have pre-created the cluster? I see we're doing all the bits.

@vishesh92
Copy link
Member Author

vishesh92 commented May 11, 2023

cluster has a different meaning when it comes to cloudstack infrastructure. In context of this PR, cluster == kubernetes cluster.

We are doing two operation when the CAPC managed kubernetes cluster comes up:

  1. Create an entry in CloudStack's kubernetes dashboard. This is just a simple insert into db and cloudstack doesn't do anything else on this resource.
  2. When the machine deployments get created or deleted, we link or de-link the VM (created by CAPC on cloudstack) with kubernetes cluster (on CloudStack) entry created earlier.

In short, we are just adding some information about the kubernetes cluster which is managed by CAPC on CloudStack.

@vishesh92 vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch from 7b3e85e to d6d9af2 Compare May 12, 2023 06:22
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 13, 2023
@rohityadavcloud
Copy link
Member

ping @vishesh92 is this up to date?

@vishesh92 vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch from d6d9af2 to ba2d468 Compare July 26, 2023 09:57
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 26, 2023
@vishesh92 vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch 2 times, most recently from 7af31c6 to 38bd29b Compare July 27, 2023 12:22
@codecov-commenter
Copy link

codecov-commenter commented Jul 28, 2023

Codecov Report

Patch coverage: 7.43% and project coverage change: -0.41% ⚠️

Comparison is base (58fad9f) 24.95% compared to head (d1536ac) 24.55%.
Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #250      +/-   ##
==========================================
- Coverage   24.95%   24.55%   -0.41%     
==========================================
  Files          59       60       +1     
  Lines        5558     5682     +124     
==========================================
+ Hits         1387     1395       +8     
- Misses       4035     4150     +115     
- Partials      136      137       +1     
Files Changed Coverage Δ
api/v1beta3/cloudstackcluster_types.go 100.00% <ø> (ø)
controllers/cloudstackcluster_controller.go 16.66% <0.00%> (-4.63%) ⬇️
controllers/cloudstackfailuredomain_controller.go 61.00% <ø> (ø)
pkg/cloud/client.go 48.29% <ø> (ø)
pkg/cloud/cluster.go 0.00% <0.00%> (ø)
controllers/cloudstackmachine_controller.go 54.87% <87.50%> (+0.56%) ⬆️
pkg/cloud/isolated_network.go 59.05% <100.00%> (+0.16%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

go.sum Outdated
@@ -448,6 +450,8 @@ github.com/rogpeppe/go-internal v1.6.1/go.mod h1:xXDCJY+GAPziupqXw64V24skbSoqbTE
github.com/russross/blackfriday/v2 v2.0.1/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/ryanuber/columnize v0.0.0-20160712163229-9b3edd62028f/go.mod h1:sm1tb6uqfes/u+d4ooFouqFdy9/2g9QGwK3SQygK0Ts=
github.com/sean-/seed v0.0.0-20170313163322-e2103e2c3529/go.mod h1:DxrIzT+xaE7yg65j358z/aeFdxmN0P9QXhEzd20vsDc=
github.com/shapeblue/cloudstack-go/v2 v2.9.1-0.20230717062313-73e4efc8a510 h1:FPRBv784robz6sZSqDGfZDZMse31lj96i+enH02Xzds=
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vishesh92
the latest cloudstack-go verison is 2.15.0 (see https://github.com/apache/cloudstack-go)
2.13.1 was used in this project before.

can you update versions of shapeblue repo ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create a beta release of cloudstack-go and use that here instead

@blueorangutan
Copy link

Tests were aborted.

1 similar comment
@blueorangutan
Copy link

Tests were aborted.

@vishesh92
Copy link
Member Author

/run-e2e -c 4.18

@blueorangutan
Copy link

@vishesh92 a jenkins job has been kicked to run test with following paramaters:

  • kubernetes version: 1.27.2
  • CloudStack version: 4.18
  • hypervisor: kvm
  • template: ubuntu-2004-kube
  • Kubernetes upgrade from: 1.26.5 to 1.27.2

@blueorangutan
Copy link

Test Results : (tid-298)
Environment: kvm Rocky8(x3), Advanced Networking with Management Server Rocky8
Kubernetes Version: v1.27.2
Kubernetes Version upgrade from: v1.26.5
Kubernetes Version upgrade to: v1.27.2
CloudStack Version: 4.18
Template: ubuntu-2004-kube
E2E Test Run Logs: https://github.com/blueorangutan/capc-prs/releases/download/capc-pr-ci-cd/capc-e2e-artifacts-pr250-sl-298.zip

[PASS] When testing K8S conformance [Conformance] Should create a workload cluster and run kubetest
[PASS] When the specified resource does not exist Should fail due to the specified account is not found [TC4a]
[PASS] When the specified resource does not exist Should fail due to the specified domain is not found [TC4b]
[PASS] When the specified resource does not exist Should fail due to the specified zone is not found [TC3]
[PASS] When the specified resource does not exist Should fail due to the public IP can not be found


Summarizing 24 Failures:

[Fail] When testing machine remediation [It] Should replace a machine when it is destroyed 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing node drain timeout [It] A node should be forcefully removed if it cannot be drained in time 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing multiple CPs in a shared network with kubevip [It] Should successfully create a cluster with multiple CPs in a shared network 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing with custom disk offering [It] Should successfully create a cluster with a custom disk offering 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing app deployment to the workload cluster with network interruption [ToxiProxy] [It] Should be able to create a cluster despite a network interruption during that process 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing Kubernetes version upgrades [It] Should successfully upgrade kubernetes versions when there is a change in relevant fields 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing app deployment to the workload cluster with slow network [ToxiProxy] [BeforeEach] Should be able to download an HTML from the app deployed to the workload cluster 
/jenkins/workspace/capc-e2e-new/test/e2e/toxiproxy/toxiProxy.go:203

[Fail] When testing affinity group [It] Should have host affinity group when affinity is pro 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing affinity group [It] Should have host affinity group when affinity is anti 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing subdomain [It] Should create a cluster in a subdomain 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing app deployment to the workload cluster [TC1][PR-Blocking] [It] Should be able to download an HTML from the app deployed to the workload cluster 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] with two clusters [It] should successfully add and remove a second cluster without breaking the first cluster 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing horizontal scale out/in [TC17][TC18][TC20][TC21] [It] Should successfully scale machine replicas up and down horizontally 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing with disk offering [It] Should successfully create a cluster with disk offering 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When the specified resource does not exist [It] Should fail due to the specified control plane offering is not found [TC7] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified template is not found [TC6] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified disk offering is not found 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the compute resources are not sufficient for the specified offering [TC8] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified disk offer is not customized but the disk size is specified 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified disk offer is customized but the disk size is not specified 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist When starting with a healthy cluster [BeforeEach] Should fail to upgrade worker machine due to insufficient compute resources 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When the specified resource does not exist When starting with a healthy cluster [BeforeEach] Should fail to upgrade control plane machine due to insufficient compute resources 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing resource cleanup [It] Should create a new network when the specified network does not exist 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing MachineDeployment rolling upgrades [It] Should successfully upgrade Machines upon changes in relevant MachineDeployment fields 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

Ran 28 of 29 Specs in 7147.886 seconds
FAIL! -- 4 Passed | 24 Failed | 0 Pending | 1 Skipped
--- FAIL: TestE2E (7147.90s)
FAIL

@vishesh92 vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch from c60547b to d15a4ea Compare July 31, 2023 14:49
@vishesh92
Copy link
Member Author

/run-e2e -c 4.18

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 21, 2023
@vishesh92 vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch 5 times, most recently from 2adcf65 to 13a0687 Compare August 22, 2023 05:18
@blueorangutan
Copy link

Tests were aborted.

@vishesh92 vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch from 13a0687 to 3c76b28 Compare August 23, 2023 05:19
@blueorangutan
Copy link

Tests were aborted.

@vishesh92 vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch from d1536ac to 53b5f2e Compare August 25, 2023 04:55
@vishesh92 vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch from 53b5f2e to 254d93b Compare August 25, 2023 05:51
@vishesh92
Copy link
Member Author

/run-e2e -c 4.18

@blueorangutan
Copy link

@vishesh92 a jenkins job has been kicked to run test with following paramaters:

  • kubernetes version: 1.27.2
  • CloudStack version: 4.18
  • hypervisor: kvm
  • template: ubuntu-2004-kube
  • Kubernetes upgrade from: 1.26.5 to 1.27.2

@blueorangutan
Copy link

Test Results : (tid-325)
Environment: kvm Rocky8(x3), Advanced Networking with Management Server Rocky8
Kubernetes Version: v1.27.2
Kubernetes Version upgrade from: v1.26.5
Kubernetes Version upgrade to: v1.27.2
CloudStack Version: 4.18
Template: ubuntu-2004-kube
E2E Test Run Logs: https://github.com/blueorangutan/capc-prs/releases/download/capc-pr-ci-cd/capc-e2e-artifacts-pr250-sl-325.zip

[PASS] When testing app deployment to the workload cluster with network interruption [ToxiProxy] Should be able to create a cluster despite a network interruption during that process
[PASS] When testing resource cleanup Should create a new network when the specified network does not exist
[PASS] When testing horizontal scale out/in [TC17][TC18][TC20][TC21] Should successfully scale machine replicas up and down horizontally
[PASS] When testing with custom disk offering Should successfully create a cluster with a custom disk offering
[PASS] When testing creation of unmanaged CKS cluster in ACS Should create a workload cluster
[PASS] When testing multiple CPs in a shared network with kubevip Should successfully create a cluster with multiple CPs in a shared network
[PASS] When testing machine remediation Should replace a machine when it is destroyed
[PASS] When testing MachineDeployment rolling upgrades Should successfully upgrade Machines upon changes in relevant MachineDeployment fields
[PASS] When testing K8S conformance [Conformance] Should create a workload cluster and run kubetest
[PASS] When testing affinity group Should have host affinity group when affinity is pro
[PASS] When testing affinity group Should have host affinity group when affinity is anti
[PASS] When testing node drain timeout A node should be forcefully removed if it cannot be drained in time
[PASS] When the specified resource does not exist Should fail due to the specified account is not found [TC4a]
[PASS] When the specified resource does not exist Should fail due to the specified domain is not found [TC4b]
[PASS] When the specified resource does not exist Should fail due to the specified control plane offering is not found [TC7]
[PASS] When the specified resource does not exist Should fail due to the specified template is not found [TC6]
[PASS] When the specified resource does not exist Should fail due to the specified zone is not found [TC3]
[PASS] When the specified resource does not exist Should fail due to the specified disk offering is not found
[PASS] When the specified resource does not exist Should fail due to the compute resources are not sufficient for the specified offering [TC8]
[PASS] When the specified resource does not exist Should fail due to the specified disk offer is not customized but the disk size is specified
[PASS] When the specified resource does not exist Should fail due to the specified disk offer is customized but the disk size is not specified
[PASS] When the specified resource does not exist Should fail due to the public IP can not be found
[PASS] When the specified resource does not exist When starting with a healthy cluster Should fail to upgrade worker machine due to insufficient compute resources
[PASS] When the specified resource does not exist When starting with a healthy cluster Should fail to upgrade control plane machine due to insufficient compute resources
[PASS] When testing app deployment to the workload cluster [TC1][PR-Blocking] Should be able to download an HTML from the app deployed to the workload cluster
[PASS] When testing Kubernetes version upgrades Should successfully upgrade kubernetes versions when there is a change in relevant fields
[PASS] When testing subdomain Should create a cluster in a subdomain
[PASS] with two clusters should successfully add and remove a second cluster without breaking the first cluster
[PASS] When testing with disk offering Should successfully create a cluster with disk offering


Summarizing 1 Failure:

[Fail] When testing app deployment to the workload cluster with slow network [ToxiProxy] [It] Should be able to download an HTML from the app deployed to the workload cluster 
/jenkins/workspace/capc-e2e-new/test/e2e/deploy_app_toxi.go:134

Ran 28 of 30 Specs in 8968.702 seconds
FAIL! -- 27 Passed | 1 Failed | 0 Pending | 2 Skipped
--- FAIL: TestE2E (8968.71s)
FAIL

@g-gaston
Copy link
Contributor

g-gaston commented Aug 28, 2023

@rohityadavcloud

I'm happy to discuss if there are other ways to accomplish this without making changes in CAPC.

It seems like this could be done by anything with access to the cloudstack API and the kube api server, right? We don't need special information that is only available inside the capc controller or to hook into particular events that are hidden in the reconcilers. This just needs to watch the CloudstackCluster resources and possibly the CloudstackMachines's?

@blueorangutan
Copy link

Test Results : (tid-326)
Environment: kvm Rocky8(x3), Advanced Networking with Management Server Rocky8
Kubernetes Version: v1.27.2
Kubernetes Version upgrade from: v1.26.5
Kubernetes Version upgrade to: v1.27.2
CloudStack Version: 4.18
Template: ubuntu-2004-kube
E2E Test Run Logs: https://github.com/blueorangutan/capc-prs/releases/download/capc-pr-ci-cd/capc-e2e-artifacts-pr250-sl-326.zip

[PASS] When testing app deployment to the workload cluster with slow network [ToxiProxy] Should be able to download an HTML from the app deployed to the workload cluster
[PASS] When testing creation of unmanaged CKS cluster in ACS Should create a workload cluster
[PASS] with two clusters should successfully add and remove a second cluster without breaking the first cluster
[PASS] When testing app deployment to the workload cluster [TC1][PR-Blocking] Should be able to download an HTML from the app deployed to the workload cluster
[PASS] When testing app deployment to the workload cluster with network interruption [ToxiProxy] Should be able to create a cluster despite a network interruption during that process
[PASS] When testing horizontal scale out/in [TC17][TC18][TC20][TC21] Should successfully scale machine replicas up and down horizontally
[PASS] When testing with disk offering Should successfully create a cluster with disk offering
[PASS] When testing machine remediation Should replace a machine when it is destroyed
[PASS] When testing multiple CPs in a shared network with kubevip Should successfully create a cluster with multiple CPs in a shared network
[PASS] When testing affinity group Should have host affinity group when affinity is pro
[PASS] When testing affinity group Should have host affinity group when affinity is anti
[PASS] When testing MachineDeployment rolling upgrades Should successfully upgrade Machines upon changes in relevant MachineDeployment fields
[PASS] When testing with custom disk offering Should successfully create a cluster with a custom disk offering
[PASS] When testing K8S conformance [Conformance] Should create a workload cluster and run kubetest
[PASS] When testing subdomain Should create a cluster in a subdomain
[PASS] When testing resource cleanup Should create a new network when the specified network does not exist
[PASS] When testing node drain timeout A node should be forcefully removed if it cannot be drained in time
[PASS] When testing Kubernetes version upgrades Should successfully upgrade kubernetes versions when there is a change in relevant fields


Summarizing 12 Failures:

[Fail] When the specified resource does not exist [It] Should fail due to the specified account is not found [TC4a] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified domain is not found [TC4b] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified control plane offering is not found [TC7] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified template is not found [TC6] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified zone is not found [TC3] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified disk offering is not found 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the compute resources are not sufficient for the specified offering [TC8] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified disk offer is not customized but the disk size is specified 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified disk offer is customized but the disk size is not specified 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the public IP can not be found 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist When starting with a healthy cluster [It] Should fail to upgrade worker machine due to insufficient compute resources 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist When starting with a healthy cluster [It] Should fail to upgrade control plane machine due to insufficient compute resources 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

Ran 27 of 30 Specs in 11972.694 seconds
FAIL! -- 15 Passed | 12 Failed | 0 Pending | 3 Skipped
--- FAIL: TestE2E (11972.71s)
FAIL

@vishesh92
Copy link
Member Author

vishesh92 commented Aug 29, 2023

It seems like this could be done by anything with access to the cloudstack API and the kube api server, right? We don't need special information that is only available inside the capc controller or to hook into particular events that are hidden in the reconcilers. This just needs to watch the CloudstackCluster resources and possibly the CloudstackMachines's?

@g-gaston I want to make the integration a part of the CAPC controller since it has enough information about all the resources and the account and domain it needs to be part of. In future we might add even more information regarding the CAPC cluster in CKS as well and having it as part of the controller would help.

@hrak
Copy link
Contributor

hrak commented Aug 29, 2023

I might be missing the point of this feature but it feels a little bit like unnecessary complexity. It's not only just adding this feature, now we need to maintain it and test it and the benefits seem dubious to be.

Given that the CAPI API is our pane of glass to describe k8s clusters, adding a second one that works in parallel and not built on top of it appears like duplicated functionality.

I'm with you, this is adding a lot of CKS-specific logic to the controller, whose only task it should be to reconcile CAPI based clusters. I think there are bigger fish to fry (keeping up-to-date with CAPI and k8s versions, releasing more often etc.)

@rohityadavcloud
Copy link
Member

rohityadavcloud commented Aug 29, 2023

Hi @g-gaston @hrak I see your points and completely understand why would anybody want to make CAPC complex. But hear us out.

Apache CloudStack has always been a community-led, turnkey IaaS platform. The emphasis on turn-key, easy to use, setup and deploy, with a great UI and that it just works (out of the box). While CAPC can be used as a building-block, such as for EKS-A, we think this is also used directly as a first-class tool in itself with clusterctl.

Before CAPC, CloudStack's only supposed Kubernetes integration was in the form of CKS (CloudStack Kubernetes Service) wherein CloudStack orchestrates k8s clusters for its users. However, the CKS built clusters make some assumption and deploy specific CNIs to make this work out of the box - easy to deploy k8s cluster for most users. And CKS is available for all user account types, if the service providers, operator or enterprise admin enables and configures CKS in their environment.

With CAPC, there is more flexibility but this isn't available for all user types (currently, domain and root admins) and a group of contributors think it would be good idea if an end user can simply deploy a CAPC cluster and there is just a linked-resource created in the form of unmanaged CKS cluster (for example, to give you an idea head over to https://qa.cloudstack.cloud/simulator/#/kubernetes to see such a resource, this shoudn't be confused with a CloudStack Cluster of a zone). In future, the CloudStack CKS implementation might be refactored to just use CAPC (so not to have multiple ways of achieving the same thing).

To address these, we've created a general mechanism via apache/cloudstack#7515 and series of fixes such the ability for normal user accounts to pick/use a public IP say in a shared network with apache/cloudstack#7817. This could help fix #303 in future releases. The final lacking piece if a normal user account deploy a CAPC cluster, can we give him a control plane view to say there exists a k8s cluster in CloudStack but it's externally (CAPC) managed - that's where this feature comes in.

Now, you're both right - we don't need this PR to do that, there could be an external script and tool for this, however, it would make it complex for users and operators to set this up and the overall solution wouldn't be turnkey. Such an external tool/script would need to be aware of CloudStack and aware/access of the CAPC cluster and do that for all accounts would just introduce multiple competing control plane paths.

So, here's my proposal - @vishesh92 can introduce an new optional/parameter in the CAPC cluster generated yaml (or via a support cmd-line arg that generate the same) to enable/disable the feature. By default, this is kept enabled so any CAPC user deploying a k8s cluster see an unmanaged CKS cluster in the UI. And downstream tools and users who don't want this behaviour can turn it off via the option/setting in the generated yaml. This way you can control that the new code introduced in this PR doesn't get into the operational path in downstream use-cases, and upstream CAPC and ACS releases are turnkey.

Thoughts?

@chrisdoherty4
Copy link
Member

chrisdoherty4 commented Aug 29, 2023

Hi @rohityadavcloud,

Thanks for sharing your thoughts and elaborating on intent. I do have reservations about this feature. Maintaining the scope outlined in the Cluster API documentation for infrastructure providers ensures we don't feature creep and have an easier time maintaining CAPC. Additional functionality, such as this PR, can be added through auxiliary components, as you highlighted. Furthermore, considering the roadmap for CKS integration is somewhat unclear to the CAPC community, having a best effort integration doesn't feel like the right path forward right now.

we think this is also used directly as a first-class tool in itself with clusterctl.

I agree. CAPC is built to be used independently, that's true of all Cluster API infrastructure providers.

With CAPC, there is more flexibility but this isn't available for all user types ... it would be good idea if an end user can simply deploy a CAPC cluster and there is just a linked-resource created in the form of unmanaged CKS cluster

This sounds like 2 different asks:

  1. "I want to deploy CAPC clusters using any user type"
  2. "I want a cluster to appear in my CKS plugin UI"

On (1), I think we need to ask why said user types are required and broaden the scope we're considering. For example, perhaps a more flexible permissions model in Apache CloudStack would enable finer permission control so users can create whatever kind of user they wish?

On (2), I empathize with this desire, but I question what a user can do with an unmanaged CKS cluster (not to mention it isn't a CKS cluster) - it's read-only if I'm not mistaken? Have we established why this is needed, what is the user really trying to do? In any case, with my current understanding it feels like it falls outside the scope of CAPC.

however, it would make it complex for users and operators to set this up and the overall solution wouldn't be turnkey

I think that's dependent on how you frame the solution. Cluster API in itself isn't turnkey: you have to deploy Cluster API, then you can create your clusters. There's nothing preventing building a solution that wraps deployment of all the necessary components, including a component that embodies the integration in this PR, which would be truly turnkey; though I don't think that's a concern for CAPC.

Such an external tool/script would need to be aware of CloudStack and aware/access of the CAPC cluster and do that for all accounts would just introduce multiple competing control plane paths.

That seems like an appropriate set of dependencies for this kind of component, though I don't understand the "do it for all accounts" or "competing control plane paths" parts; perhaps you could elaborate?

I think its worth pointing out that all existing components built for Kubernetes have this kind of dependency set. Monitoring tools like Jaeger and Prometheus come to mind.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vishesh92
Copy link
Member Author

One of my motivations for this PR was to improve the experience of Cloudstack operators and users.

I understand your point of view on this PR, but it would have been better if I could have gotten a feedback before I spent time on this PR and the required work in cloudstack & cloudstack-go. I opened this PR in May and getting feedback in August that this is not the right way to go is not very motivating. I would have been happy to discuss this PR in May. I am not sure what the best way forward is, but I would like to avoid spending time on a PR that will not be merged.

@vishesh92 vishesh92 closed this Sep 1, 2023
@chrisdoherty4
Copy link
Member

chrisdoherty4 commented Sep 1, 2023

Hi @vishesh92. It sounds like you tried gathering feedback prior to completing the Apache CloudStack and CloudStack Go work, and the work in this PR. I'm sorry we missed that and have provided feedback late in the day including leaving this PR untouched for a prolonged period. We should be aiming to avoid this again, lets chat in the Kubernetes CAPC Slack on how to best address the communication inefficiencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants