Create unmanaged kubernetes cluster on CloudStack #250

vishesh92 · 2023-05-11T08:44:14Z

Issue #, if available:
Description of changes:
Creates an unmanaged kubernetes cluster on Cloudstack. This provides a nice view of CAPC managed kubernetes cluster in CloudStack UI.

Snapshots from ACS Dashboard:

Depends on: apache/cloudstack#7515 apache/cloudstack-go#59 (go.mod needs to be updated once this is merged)
Testing performed:
Tested locally by creating a kubernetes cluster using cluster api.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

k8s-ci-robot · 2023-05-11T08:44:23Z

Hi @vishesh92. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

netlify · 2023-05-11T08:44:23Z

✅ Deploy Preview for kubernetes-sigs-cluster-api-cloudstack ready!

Name	Link
🔨 Latest commit	`254d93b`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-cluster-api-cloudstack/deploys/64e8415959d0bb000815e215
😎 Deploy Preview	https://deploy-preview-250--kubernetes-sigs-cluster-api-cloudstack.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

chrisdoherty4 · 2023-05-11T11:40:06Z

Hi @vishesh92. I took a look at some of the other changes in CloudStack that you referenced and I'm struggling to understand what we're trying to achieve and why. Would you mind elaborating?

vishesh92 · 2023-05-11T11:48:29Z

Hi @chrisdoherty4
This creates an entry for a kubernetes cluster in CloudStack. This helps provide a good view of CAPC managed clusters on CloudStack and the VMs associated with it. This PR is dependent on apache/cloudstack-go PR merge and release. So, it will be some time before this PR can be merged.

chrisdoherty4 · 2023-05-11T12:00:30Z

@vishesh92 So populating this value would bind VMs to an Apache CloudStack Cluster ('cluster' is hyperloaded in this context)?

~~Am I right in saying we expect users to have pre-created the cluster?~~ I see we're doing all the bits.

config/.flag-test.mk

vishesh92 · 2023-05-11T12:42:44Z

cluster has a different meaning when it comes to cloudstack infrastructure. In context of this PR, cluster == kubernetes cluster.

We are doing two operation when the CAPC managed kubernetes cluster comes up:

Create an entry in CloudStack's kubernetes dashboard. This is just a simple insert into db and cloudstack doesn't do anything else on this resource.
When the machine deployments get created or deleted, we link or de-link the VM (created by CAPC on cloudstack) with kubernetes cluster (on CloudStack) entry created earlier.

In short, we are just adding some information about the kubernetes cluster which is managed by CAPC on CloudStack.

rohityadavcloud · 2023-07-17T05:36:25Z

ping @vishesh92 is this up to date?

codecov-commenter · 2023-07-28T08:02:37Z

Codecov Report

Patch coverage: 7.43% and project coverage change: -0.41% ⚠️

Comparison is base (58fad9f) 24.95% compared to head (d1536ac) 24.55%.
Report is 8 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #250      +/-   ##
==========================================
- Coverage   24.95%   24.55%   -0.41%     
==========================================
  Files          59       60       +1     
  Lines        5558     5682     +124     
==========================================
+ Hits         1387     1395       +8     
- Misses       4035     4150     +115     
- Partials      136      137       +1

Files Changed	Coverage Δ
api/v1beta3/cloudstackcluster_types.go	`100.00% <ø> (ø)`
controllers/cloudstackcluster_controller.go	`16.66% <0.00%> (-4.63%)`	⬇️
controllers/cloudstackfailuredomain_controller.go	`61.00% <ø> (ø)`
pkg/cloud/client.go	`48.29% <ø> (ø)`
pkg/cloud/cluster.go	`0.00% <0.00%> (ø)`
controllers/cloudstackmachine_controller.go	`54.87% <87.50%> (+0.56%)`	⬆️
pkg/cloud/isolated_network.go	`59.05% <100.00%> (+0.16%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

weizhouapache · 2023-07-28T09:28:01Z

go.sum

@@ -448,6 +450,8 @@ github.com/rogpeppe/go-internal v1.6.1/go.mod h1:xXDCJY+GAPziupqXw64V24skbSoqbTE
 github.com/russross/blackfriday/v2 v2.0.1/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
 github.com/ryanuber/columnize v0.0.0-20160712163229-9b3edd62028f/go.mod h1:sm1tb6uqfes/u+d4ooFouqFdy9/2g9QGwK3SQygK0Ts=
 github.com/sean-/seed v0.0.0-20170313163322-e2103e2c3529/go.mod h1:DxrIzT+xaE7yg65j358z/aeFdxmN0P9QXhEzd20vsDc=
+github.com/shapeblue/cloudstack-go/v2 v2.9.1-0.20230717062313-73e4efc8a510 h1:FPRBv784robz6sZSqDGfZDZMse31lj96i+enH02Xzds=


@vishesh92
the latest cloudstack-go verison is 2.15.0 (see https://github.com/apache/cloudstack-go)
2.13.1 was used in this project before.

can you update versions of shapeblue repo ?

We create a beta release of cloudstack-go and use that here instead

blueorangutan · 2023-07-28T12:39:46Z

Tests were aborted.

blueorangutan · 2023-07-28T22:15:49Z

Tests were aborted.

vishesh92 · 2023-07-29T07:00:29Z

/run-e2e -c 4.18

blueorangutan · 2023-07-29T07:01:02Z

@vishesh92 a jenkins job has been kicked to run test with following paramaters:

kubernetes version: 1.27.2
CloudStack version: 4.18
hypervisor: kvm
template: ubuntu-2004-kube
Kubernetes upgrade from: 1.26.5 to 1.27.2

blueorangutan · 2023-07-29T09:41:09Z

Test Results : (tid-298)
Environment: kvm Rocky8(x3), Advanced Networking with Management Server Rocky8
Kubernetes Version: v1.27.2
Kubernetes Version upgrade from: v1.26.5
Kubernetes Version upgrade to: v1.27.2
CloudStack Version: 4.18
Template: ubuntu-2004-kube
E2E Test Run Logs: https://github.com/blueorangutan/capc-prs/releases/download/capc-pr-ci-cd/capc-e2e-artifacts-pr250-sl-298.zip

[PASS] When testing K8S conformance [Conformance] Should create a workload cluster and run kubetest
[PASS] When the specified resource does not exist Should fail due to the specified account is not found [TC4a]
[PASS] When the specified resource does not exist Should fail due to the specified domain is not found [TC4b]
[PASS] When the specified resource does not exist Should fail due to the specified zone is not found [TC3]
[PASS] When the specified resource does not exist Should fail due to the public IP can not be found


Summarizing 24 Failures:

[Fail] When testing machine remediation [It] Should replace a machine when it is destroyed 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing node drain timeout [It] A node should be forcefully removed if it cannot be drained in time 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing multiple CPs in a shared network with kubevip [It] Should successfully create a cluster with multiple CPs in a shared network 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing with custom disk offering [It] Should successfully create a cluster with a custom disk offering 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing app deployment to the workload cluster with network interruption [ToxiProxy] [It] Should be able to create a cluster despite a network interruption during that process 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing Kubernetes version upgrades [It] Should successfully upgrade kubernetes versions when there is a change in relevant fields 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing app deployment to the workload cluster with slow network [ToxiProxy] [BeforeEach] Should be able to download an HTML from the app deployed to the workload cluster 
/jenkins/workspace/capc-e2e-new/test/e2e/toxiproxy/toxiProxy.go:203

[Fail] When testing affinity group [It] Should have host affinity group when affinity is pro 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing affinity group [It] Should have host affinity group when affinity is anti 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing subdomain [It] Should create a cluster in a subdomain 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing app deployment to the workload cluster [TC1][PR-Blocking] [It] Should be able to download an HTML from the app deployed to the workload cluster 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] with two clusters [It] should successfully add and remove a second cluster without breaking the first cluster 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing horizontal scale out/in [TC17][TC18][TC20][TC21] [It] Should successfully scale machine replicas up and down horizontally 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing with disk offering [It] Should successfully create a cluster with disk offering 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When the specified resource does not exist [It] Should fail due to the specified control plane offering is not found [TC7] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified template is not found [TC6] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified disk offering is not found 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the compute resources are not sufficient for the specified offering [TC8] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified disk offer is not customized but the disk size is specified 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified disk offer is customized but the disk size is not specified 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist When starting with a healthy cluster [BeforeEach] Should fail to upgrade worker machine due to insufficient compute resources 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When the specified resource does not exist When starting with a healthy cluster [BeforeEach] Should fail to upgrade control plane machine due to insufficient compute resources 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing resource cleanup [It] Should create a new network when the specified network does not exist 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

[Fail] When testing MachineDeployment rolling upgrades [It] Should successfully upgrade Machines upon changes in relevant MachineDeployment fields 
/root/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/cluster_helpers.go:143

Ran 28 of 29 Specs in 7147.886 seconds
FAIL! -- 4 Passed | 24 Failed | 0 Pending | 1 Skipped
--- FAIL: TestE2E (7147.90s)
FAIL

vishesh92 · 2023-07-31T14:52:06Z

/run-e2e -c 4.18

blueorangutan · 2023-08-22T06:45:01Z

Tests were aborted.

blueorangutan · 2023-08-23T06:06:49Z

Tests were aborted.

vishesh92 · 2023-08-28T07:33:43Z

/run-e2e -c 4.18

blueorangutan · 2023-08-28T07:34:02Z

@vishesh92 a jenkins job has been kicked to run test with following paramaters:

kubernetes version: 1.27.2
CloudStack version: 4.18
hypervisor: kvm
template: ubuntu-2004-kube
Kubernetes upgrade from: 1.26.5 to 1.27.2

blueorangutan · 2023-08-28T10:44:56Z

Test Results : (tid-325)
Environment: kvm Rocky8(x3), Advanced Networking with Management Server Rocky8
Kubernetes Version: v1.27.2
Kubernetes Version upgrade from: v1.26.5
Kubernetes Version upgrade to: v1.27.2
CloudStack Version: 4.18
Template: ubuntu-2004-kube
E2E Test Run Logs: https://github.com/blueorangutan/capc-prs/releases/download/capc-pr-ci-cd/capc-e2e-artifacts-pr250-sl-325.zip

[PASS] When testing app deployment to the workload cluster with network interruption [ToxiProxy] Should be able to create a cluster despite a network interruption during that process
[PASS] When testing resource cleanup Should create a new network when the specified network does not exist
[PASS] When testing horizontal scale out/in [TC17][TC18][TC20][TC21] Should successfully scale machine replicas up and down horizontally
[PASS] When testing with custom disk offering Should successfully create a cluster with a custom disk offering
[PASS] When testing creation of unmanaged CKS cluster in ACS Should create a workload cluster
[PASS] When testing multiple CPs in a shared network with kubevip Should successfully create a cluster with multiple CPs in a shared network
[PASS] When testing machine remediation Should replace a machine when it is destroyed
[PASS] When testing MachineDeployment rolling upgrades Should successfully upgrade Machines upon changes in relevant MachineDeployment fields
[PASS] When testing K8S conformance [Conformance] Should create a workload cluster and run kubetest
[PASS] When testing affinity group Should have host affinity group when affinity is pro
[PASS] When testing affinity group Should have host affinity group when affinity is anti
[PASS] When testing node drain timeout A node should be forcefully removed if it cannot be drained in time
[PASS] When the specified resource does not exist Should fail due to the specified account is not found [TC4a]
[PASS] When the specified resource does not exist Should fail due to the specified domain is not found [TC4b]
[PASS] When the specified resource does not exist Should fail due to the specified control plane offering is not found [TC7]
[PASS] When the specified resource does not exist Should fail due to the specified template is not found [TC6]
[PASS] When the specified resource does not exist Should fail due to the specified zone is not found [TC3]
[PASS] When the specified resource does not exist Should fail due to the specified disk offering is not found
[PASS] When the specified resource does not exist Should fail due to the compute resources are not sufficient for the specified offering [TC8]
[PASS] When the specified resource does not exist Should fail due to the specified disk offer is not customized but the disk size is specified
[PASS] When the specified resource does not exist Should fail due to the specified disk offer is customized but the disk size is not specified
[PASS] When the specified resource does not exist Should fail due to the public IP can not be found
[PASS] When the specified resource does not exist When starting with a healthy cluster Should fail to upgrade worker machine due to insufficient compute resources
[PASS] When the specified resource does not exist When starting with a healthy cluster Should fail to upgrade control plane machine due to insufficient compute resources
[PASS] When testing app deployment to the workload cluster [TC1][PR-Blocking] Should be able to download an HTML from the app deployed to the workload cluster
[PASS] When testing Kubernetes version upgrades Should successfully upgrade kubernetes versions when there is a change in relevant fields
[PASS] When testing subdomain Should create a cluster in a subdomain
[PASS] with two clusters should successfully add and remove a second cluster without breaking the first cluster
[PASS] When testing with disk offering Should successfully create a cluster with disk offering


Summarizing 1 Failure:

[Fail] When testing app deployment to the workload cluster with slow network [ToxiProxy] [It] Should be able to download an HTML from the app deployed to the workload cluster 
/jenkins/workspace/capc-e2e-new/test/e2e/deploy_app_toxi.go:134

Ran 28 of 30 Specs in 8968.702 seconds
FAIL! -- 27 Passed | 1 Failed | 0 Pending | 2 Skipped
--- FAIL: TestE2E (8968.71s)
FAIL

g-gaston · 2023-08-28T13:01:35Z

@rohityadavcloud

I'm happy to discuss if there are other ways to accomplish this without making changes in CAPC.

It seems like this could be done by anything with access to the cloudstack API and the kube api server, right? We don't need special information that is only available inside the capc controller or to hook into particular events that are hidden in the reconcilers. This just needs to watch the CloudstackCluster resources and possibly the CloudstackMachines's?

blueorangutan · 2023-08-28T14:09:18Z

Test Results : (tid-326)
Environment: kvm Rocky8(x3), Advanced Networking with Management Server Rocky8
Kubernetes Version: v1.27.2
Kubernetes Version upgrade from: v1.26.5
Kubernetes Version upgrade to: v1.27.2
CloudStack Version: 4.18
Template: ubuntu-2004-kube
E2E Test Run Logs: https://github.com/blueorangutan/capc-prs/releases/download/capc-pr-ci-cd/capc-e2e-artifacts-pr250-sl-326.zip

[PASS] When testing app deployment to the workload cluster with slow network [ToxiProxy] Should be able to download an HTML from the app deployed to the workload cluster
[PASS] When testing creation of unmanaged CKS cluster in ACS Should create a workload cluster
[PASS] with two clusters should successfully add and remove a second cluster without breaking the first cluster
[PASS] When testing app deployment to the workload cluster [TC1][PR-Blocking] Should be able to download an HTML from the app deployed to the workload cluster
[PASS] When testing app deployment to the workload cluster with network interruption [ToxiProxy] Should be able to create a cluster despite a network interruption during that process
[PASS] When testing horizontal scale out/in [TC17][TC18][TC20][TC21] Should successfully scale machine replicas up and down horizontally
[PASS] When testing with disk offering Should successfully create a cluster with disk offering
[PASS] When testing machine remediation Should replace a machine when it is destroyed
[PASS] When testing multiple CPs in a shared network with kubevip Should successfully create a cluster with multiple CPs in a shared network
[PASS] When testing affinity group Should have host affinity group when affinity is pro
[PASS] When testing affinity group Should have host affinity group when affinity is anti
[PASS] When testing MachineDeployment rolling upgrades Should successfully upgrade Machines upon changes in relevant MachineDeployment fields
[PASS] When testing with custom disk offering Should successfully create a cluster with a custom disk offering
[PASS] When testing K8S conformance [Conformance] Should create a workload cluster and run kubetest
[PASS] When testing subdomain Should create a cluster in a subdomain
[PASS] When testing resource cleanup Should create a new network when the specified network does not exist
[PASS] When testing node drain timeout A node should be forcefully removed if it cannot be drained in time
[PASS] When testing Kubernetes version upgrades Should successfully upgrade kubernetes versions when there is a change in relevant fields


Summarizing 12 Failures:

[Fail] When the specified resource does not exist [It] Should fail due to the specified account is not found [TC4a] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified domain is not found [TC4b] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified control plane offering is not found [TC7] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified template is not found [TC6] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified zone is not found [TC3] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified disk offering is not found 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the compute resources are not sufficient for the specified offering [TC8] 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified disk offer is not customized but the disk size is specified 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the specified disk offer is customized but the disk size is not specified 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist [It] Should fail due to the public IP can not be found 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist When starting with a healthy cluster [It] Should fail to upgrade worker machine due to insufficient compute resources 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

[Fail] When the specified resource does not exist When starting with a healthy cluster [It] Should fail to upgrade control plane machine due to insufficient compute resources 
/jenkins/workspace/capc-e2e-new/test/e2e/invalid_resource.go:253

Ran 27 of 30 Specs in 11972.694 seconds
FAIL! -- 15 Passed | 12 Failed | 0 Pending | 3 Skipped
--- FAIL: TestE2E (11972.71s)
FAIL

vishesh92 · 2023-08-29T07:09:42Z

It seems like this could be done by anything with access to the cloudstack API and the kube api server, right? We don't need special information that is only available inside the capc controller or to hook into particular events that are hidden in the reconcilers. This just needs to watch the CloudstackCluster resources and possibly the CloudstackMachines's?

@g-gaston I want to make the integration a part of the CAPC controller since it has enough information about all the resources and the account and domain it needs to be part of. In future we might add even more information regarding the CAPC cluster in CKS as well and having it as part of the controller would help.

hrak · 2023-08-29T08:02:52Z

I might be missing the point of this feature but it feels a little bit like unnecessary complexity. It's not only just adding this feature, now we need to maintain it and test it and the benefits seem dubious to be.

Given that the CAPI API is our pane of glass to describe k8s clusters, adding a second one that works in parallel and not built on top of it appears like duplicated functionality.

I'm with you, this is adding a lot of CKS-specific logic to the controller, whose only task it should be to reconcile CAPI based clusters. I think there are bigger fish to fry (keeping up-to-date with CAPI and k8s versions, releasing more often etc.)

rohityadavcloud · 2023-08-29T12:32:36Z

Hi @g-gaston @hrak I see your points and completely understand why would anybody want to make CAPC complex. But hear us out.

Apache CloudStack has always been a community-led, turnkey IaaS platform. The emphasis on turn-key, easy to use, setup and deploy, with a great UI and that it just works (out of the box). While CAPC can be used as a building-block, such as for EKS-A, we think this is also used directly as a first-class tool in itself with clusterctl.

Before CAPC, CloudStack's only supposed Kubernetes integration was in the form of CKS (CloudStack Kubernetes Service) wherein CloudStack orchestrates k8s clusters for its users. However, the CKS built clusters make some assumption and deploy specific CNIs to make this work out of the box - easy to deploy k8s cluster for most users. And CKS is available for all user account types, if the service providers, operator or enterprise admin enables and configures CKS in their environment.

With CAPC, there is more flexibility but this isn't available for all user types (currently, domain and root admins) and a group of contributors think it would be good idea if an end user can simply deploy a CAPC cluster and there is just a linked-resource created in the form of unmanaged CKS cluster (for example, to give you an idea head over to https://qa.cloudstack.cloud/simulator/#/kubernetes to see such a resource, this shoudn't be confused with a CloudStack Cluster of a zone). In future, the CloudStack CKS implementation might be refactored to just use CAPC (so not to have multiple ways of achieving the same thing).

To address these, we've created a general mechanism via apache/cloudstack#7515 and series of fixes such the ability for normal user accounts to pick/use a public IP say in a shared network with apache/cloudstack#7817. This could help fix #303 in future releases. The final lacking piece if a normal user account deploy a CAPC cluster, can we give him a control plane view to say there exists a k8s cluster in CloudStack but it's externally (CAPC) managed - that's where this feature comes in.

Now, you're both right - we don't need this PR to do that, there could be an external script and tool for this, however, it would make it complex for users and operators to set this up and the overall solution wouldn't be turnkey. Such an external tool/script would need to be aware of CloudStack and aware/access of the CAPC cluster and do that for all accounts would just introduce multiple competing control plane paths.

So, here's my proposal - @vishesh92 can introduce an new optional/parameter in the CAPC cluster generated yaml (or via a support cmd-line arg that generate the same) to enable/disable the feature. By default, this is kept enabled so any CAPC user deploying a k8s cluster see an unmanaged CKS cluster in the UI. And downstream tools and users who don't want this behaviour can turn it off via the option/setting in the generated yaml. This way you can control that the new code introduced in this PR doesn't get into the operational path in downstream use-cases, and upstream CAPC and ACS releases are turnkey.

Thoughts?

chrisdoherty4 · 2023-08-29T13:27:47Z

Hi @rohityadavcloud,

Thanks for sharing your thoughts and elaborating on intent. I do have reservations about this feature. Maintaining the scope outlined in the Cluster API documentation for infrastructure providers ensures we don't feature creep and have an easier time maintaining CAPC. Additional functionality, such as this PR, can be added through auxiliary components, as you highlighted. Furthermore, considering the roadmap for CKS integration is somewhat unclear to the CAPC community, having a best effort integration doesn't feel like the right path forward right now.

we think this is also used directly as a first-class tool in itself with clusterctl.

I agree. CAPC is built to be used independently, that's true of all Cluster API infrastructure providers.

With CAPC, there is more flexibility but this isn't available for all user types ... it would be good idea if an end user can simply deploy a CAPC cluster and there is just a linked-resource created in the form of unmanaged CKS cluster

This sounds like 2 different asks:

"I want to deploy CAPC clusters using any user type"
"I want a cluster to appear in my CKS plugin UI"

On (1), I think we need to ask why said user types are required and broaden the scope we're considering. For example, perhaps a more flexible permissions model in Apache CloudStack would enable finer permission control so users can create whatever kind of user they wish?

On (2), I empathize with this desire, but I question what a user can do with an unmanaged CKS cluster (not to mention it isn't a CKS cluster) - it's read-only if I'm not mistaken? Have we established why this is needed, what is the user really trying to do? In any case, with my current understanding it feels like it falls outside the scope of CAPC.

however, it would make it complex for users and operators to set this up and the overall solution wouldn't be turnkey

I think that's dependent on how you frame the solution. Cluster API in itself isn't turnkey: you have to deploy Cluster API, then you can create your clusters. There's nothing preventing building a solution that wraps deployment of all the necessary components, including a component that embodies the integration in this PR, which would be truly turnkey; though I don't think that's a concern for CAPC.

Such an external tool/script would need to be aware of CloudStack and aware/access of the CAPC cluster and do that for all accounts would just introduce multiple competing control plane paths.

That seems like an appropriate set of dependencies for this kind of component, though I don't understand the "do it for all accounts" or "competing control plane paths" parts; perhaps you could elaborate?

I think its worth pointing out that all existing components built for Kubernetes have this kind of dependency set. Monitoring tools like Jaeger and Prometheus come to mind.

k8s-ci-robot · 2023-08-30T12:12:45Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vishesh92 · 2023-09-01T08:39:53Z

One of my motivations for this PR was to improve the experience of Cloudstack operators and users.

I understand your point of view on this PR, but it would have been better if I could have gotten a feedback before I spent time on this PR and the required work in cloudstack & cloudstack-go. I opened this PR in May and getting feedback in August that this is not the right way to go is not very motivating. I would have been happy to discuss this PR in May. I am not sure what the best way forward is, but I would like to avoid spending time on a PR that will not be merged.

chrisdoherty4 · 2023-09-01T13:36:42Z

Hi @vishesh92. It sounds like you tried gathering feedback prior to completing the Apache CloudStack and CloudStack Go work, and the work in this PR. I'm sorry we missed that and have provided feedback late in the day including leaving this PR untouched for a prolonged period. We should be aiming to avoid this again, lets chat in the Kubernetes CAPC Slack on how to best address the communication inefficiencies.

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 11, 2023

k8s-ci-robot requested review from g-gaston and jweite-amazon May 11, 2023 08:44

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 11, 2023

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 11, 2023

vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch from 8a2fcf4 to 7b3e85e Compare May 11, 2023 08:53

chrisdoherty4 reviewed May 11, 2023

View reviewed changes

config/.flag-test.mk Outdated Show resolved Hide resolved

vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch from 7b3e85e to d6d9af2 Compare May 12, 2023 06:22

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 13, 2023

vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch from d6d9af2 to ba2d468 Compare July 26, 2023 09:57

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 26, 2023

vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch 2 times, most recently from 7af31c6 to 38bd29b Compare July 27, 2023 12:22

weizhouapache reviewed Jul 28, 2023

View reviewed changes

vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch from c60547b to d15a4ea Compare July 31, 2023 14:49

Rename cluster to unmanaged cluster

924ef97

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 21, 2023

vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch 5 times, most recently from 2adcf65 to 13a0687 Compare August 22, 2023 05:18

vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch from 13a0687 to 3c76b28 Compare August 23, 2023 05:19

vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch from d1536ac to 53b5f2e Compare August 25, 2023 04:55

Add test for unmanaged cluster on ACS

254d93b

vishesh92 force-pushed the add-unmanaged-k8s-on-cs branch from 53b5f2e to 254d93b Compare August 25, 2023 05:51

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2023

vishesh92 closed this Sep 1, 2023

rohityadavcloud mentioned this pull request Sep 4, 2023

Reconcile a CAPC cluster to a CKS cluster #310

Closed

rohityadavcloud mentioned this pull request Mar 22, 2024

Reconcile CAPC Cluster as CloudStack CKS Cluster (Default: Opt Out) #350

Merged

Create unmanaged kubernetes cluster on CloudStack #250

Create unmanaged kubernetes cluster on CloudStack #250

Uh oh!

Conversation

vishesh92 commented May 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented May 11, 2023

Uh oh!

netlify bot commented May 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-cluster-api-cloudstack ready!

Uh oh!

chrisdoherty4 commented May 11, 2023

Uh oh!

vishesh92 commented May 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisdoherty4 commented May 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vishesh92 commented May 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rohityadavcloud commented Jul 17, 2023

Uh oh!

codecov-commenter commented Jul 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

weizhouapache Jul 28, 2023

Choose a reason for hiding this comment

Uh oh!

vishesh92 Jul 28, 2023

Choose a reason for hiding this comment

Uh oh!

blueorangutan commented Jul 28, 2023

Uh oh!

blueorangutan commented Jul 28, 2023

Uh oh!

vishesh92 commented Jul 29, 2023

Uh oh!

blueorangutan commented Jul 29, 2023

Uh oh!

blueorangutan commented Jul 29, 2023

Uh oh!

vishesh92 commented Jul 31, 2023

Uh oh!

blueorangutan commented Aug 22, 2023

Uh oh!

blueorangutan commented Aug 23, 2023

Uh oh!

vishesh92 commented Aug 28, 2023

Uh oh!

blueorangutan commented Aug 28, 2023

Uh oh!

blueorangutan commented Aug 28, 2023

Uh oh!

g-gaston commented Aug 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blueorangutan commented Aug 28, 2023

Uh oh!

vishesh92 commented Aug 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hrak commented Aug 29, 2023

Uh oh!

rohityadavcloud commented Aug 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisdoherty4 commented Aug 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Aug 30, 2023

Uh oh!

vishesh92 commented Sep 1, 2023

Uh oh!

chrisdoherty4 commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

vishesh92 commented May 11, 2023 •

edited

Loading

netlify bot commented May 11, 2023 •

edited

Loading

vishesh92 commented May 11, 2023 •

edited

Loading

chrisdoherty4 commented May 11, 2023 •

edited

Loading

vishesh92 commented May 11, 2023 •

edited

Loading

codecov-commenter commented Jul 28, 2023 •

edited

Loading

g-gaston commented Aug 28, 2023 •

edited

Loading

vishesh92 commented Aug 29, 2023 •

edited

Loading

rohityadavcloud commented Aug 29, 2023 •

edited

Loading

chrisdoherty4 commented Aug 29, 2023 •

edited

Loading

chrisdoherty4 commented Sep 1, 2023 •

edited

Loading