Skip to content

refactor: Use maps for CSI providers and storage classes #696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 18, 2024

Conversation

jimmidyson
Copy link
Member

@jimmidyson jimmidyson commented Jun 2, 2024

This commit switches from arrays to maps for the CSI providers and
storage class configurations. This ensures unique naming and therefore
prevents name collisions.

The infra provider specific cluster configuration now includes specific
CSI configuration to ensure only compatible and supported CSI providers
are configurable for each infra provider.

Finally, this commit changes the final storage class that is created on
the workload cluster to be derived from the CSI provider name and
storage class config name, e.g. <provider>-<storageClassConfig>, once
again to avoid name collisions.

Using the API in this way reduces the chances of conflicting
configurations when a user and provides early feedback when creating the
cluster that the configurations incorrect via OpenAPI validation.

As an example, this commit updates the cluster configuration from:

csi:
  defaultStorage:
    providerName: local-path
    storageClassConfigName: local-path
  providers:
    - name: local-path
      storageClassConfig:
        - name: local-path
      strategy: HelmAddon

To:

csi:
  defaultStorage:
    provider: local-path
    storageClassConfig: default
  providers:
    local-path:
      storageClassConfigs:
        default: {}
      strategy: HelmAddon

While this may seem like only a small change, a single line removed, it
is always good to keep APIs as terse as possible while still retaining
strict behaviour. I think this commit achieves that.

@jimmidyson jimmidyson force-pushed the jimmi/remove-csi-defaultstorageprovidername branch from 6527408 to 7fdd2e1 Compare June 2, 2024 08:33
@faiq
Copy link
Contributor

faiq commented Jun 4, 2024

The code is fine - thanks for contributing this.

The reason we went with the existing API design was because it was better to be more explicit when choosing defaultStorageClass for the cluster. This change doesn't make it clear that all the storage class configurations need to be unique - and that won't be communicated to the user until the control plane is up in a workload cluster. I don't think that's very user friendly.

Copy link
Contributor

@faiq faiq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the validation happens too far down the line for this and could be frustrating for the user.

@jimmidyson
Copy link
Member Author

jimmidyson commented Jun 5, 2024

@faiq

This change doesn't make it clear that all the storage class configurations need to be unique - and that won't be communicated to the user until the control plane is up in a workload cluster. I don't think that's very user friendly.

This code doesn't change the fact the storage class names need to be unique, that is a requirement that didn't have any tests prior to this pr. If they weren't unique then when they were applied the last one in the list would override all the previous ones with the same name...

@jimmidyson
Copy link
Member Author

In future we can validate this with cel but the requirement for unique storage class names is in unchanged regardless.

dkoshkin
dkoshkin previously approved these changes Jun 7, 2024
Copy link
Contributor

@dkoshkin dkoshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree here, if we enforce uniqueness across providers (which we have to because the StorageClass names also need to be unique) then its simpler to reason about which StorageClass I want the default to be since upstream docs will already be talking about this.

Maybe to make it even more obvious what this field is for is something like:

providers:
  defaultStorageClassConfig: local-path
  - name: local-path
    storageClassConfig:
      - name: local-path
    strategy: HelmAddon

Base automatically changed from jimmi/supported-csi-providers to main June 10, 2024 11:47
@jimmidyson jimmidyson dismissed dkoshkin’s stale review June 10, 2024 11:47

The base branch was changed.

@jimmidyson jimmidyson force-pushed the jimmi/remove-csi-defaultstorageprovidername branch 2 times, most recently from e75946f to f4c313e Compare June 10, 2024 12:44
@jimmidyson
Copy link
Member Author

@dkoshkin

Maybe to make it even more obvious what this field is for is something like:

providers:
  defaultStorageClassConfig: local-path
  - name: local-path
    storageClassConfig:
      - name: local-path
    strategy: HelmAddon

I like this! I called the field defaultStorageClassConfigName to make it clear that is a name... referencing the name field i the storageClassConfigs.

@jimmidyson jimmidyson enabled auto-merge (squash) June 10, 2024 12:47
@jimmidyson
Copy link
Member Author

@faiq I replied to your concern re validation above - wdyt?

@faiq
Copy link
Contributor

faiq commented Jun 11, 2024

A simpler way of solving this is by simply prepending ${providerName}-${storageClassConfigName} when creating them on the workload to ensure uniqueness.

@jimmidyson
Copy link
Member Author

@faiq

A simpler way of solving this is by simply prepending ${providerName}-${storageClassConfigName} when creating them on the workload to ensure uniqueness.

I don't think that guarantees uniqueness, without the validation you can still specify multiple storage class configs for the same provider with the same name... Plus (less likely) conflicting combos of provider name and storage class config name. And potential name length issues.

Also with our current default cluster classes storage class configuration we would have storage classes created such as aws-ebs-aws-ebs or nutanix-nutanix-volume which feels a bit odd to me.

@faiq
Copy link
Contributor

faiq commented Jun 11, 2024

generateName can also be used.

@jimmidyson
Copy link
Member Author

@faiq

generateName can also be used.

True, but that becomes tricky to work with if you're not using default storage, especially in a gitops setup requesting storage from a particular storage class. We've had feedback before in previous projects that users prefer to be explicit in their naming so they know what they're using - we used generateName in a few places and this led to usability issues.

@jimmidyson jimmidyson force-pushed the jimmi/remove-csi-defaultstorageprovidername branch from f4c313e to 0a22ad5 Compare June 11, 2024 15:28
@github-actions github-actions bot added fix and removed fix labels Jun 11, 2024
@jimmidyson jimmidyson requested review from supershal and faiq June 12, 2024 09:48
@jimmidyson jimmidyson requested a review from dkoshkin June 12, 2024 09:49
@faiq
Copy link
Contributor

faiq commented Jun 12, 2024

True, but that becomes tricky to work with if you're not using default storage, especially in a gitops setup requesting storage from a particular storage class. We've had feedback before in previous projects that users prefer to be explicit in their naming so they know what they're using - we used generateName in a few places and this led to usability issues.

I'm not sure how any of the resources that CAREN makes are going to be tracked with gitops. They are dynamically generated based on input here.

The more I think about it, the more it makes sense to use the conventional way of creating unique names which is generateName as opposed to throwing an error after the cluster has been created.

@jimmidyson
Copy link
Member Author

@faiq

I'm not sure how any of the resources that CAREN makes are going to be tracked with gitop

Not the resources created by CAREN, but e.g. a deployment deployed via gitops that wants to use a specific storageclass - in that case you can't use generateName. Ultimately storageclass needs to be easily referenceable and the only way to do that is to use a well-defined, static name.

@dlipovetsky
Copy link
Contributor

dlipovetsky commented Jun 12, 2024

Please consider the following:

If

  1. For every StorageClassConfig, CAREN creates a StorageClass
    and
  2. a StorageClass can hold configuration (e.g. parameters) specific to a CSI provider,
    and
  3. The StorageClassConfig Name must be unique
    then, when I create a StorageClassConfig, I must have in mind (a) the provider.

In that case, I would want to name the StorageClassConfig something meaningful, i.e., something to indicate that it is associated with a provider, e.g. someprovider-somethingdescriptive, but that's really my choice.

If it helps me to include the provider in the name, then I am already associating a StorageClassConfig with a provider, so it's redundant to make me list this config under a provider in the API. We could then have a simpler API:

csi:
  defaultStorageClassConfig: bar
  storageClassConfigs:
    - local-path-foo           # Descriptive
    - anythingilike-local-path # Descriptive, but more free-form
    - bar                      # Not descriptive, but that might be what the user prefers!
  providers:
    - name: local-path
      strategy: HelmAddon
    - name: nutanix
      strategy: HelmAddon

@dlipovetsky
Copy link
Contributor

dlipovetsky commented Jun 12, 2024

I forgot that CAREN needs to derive the provider for the StorageConfig, and it does by looking at the parent provider field.

I think @faiq makes a good point about the config name uniqueness being implicit. I think deriving the StorageClass name makes sense. I'm convinced by the reasons against it.

Something like

csi:
  defaultStorageClassConfig: nutanix-volume
  providers:
    - name: local-path
      strategy: HelmAddon
      storageClassConfigs:
      - name: basic
        parameters:
          storageContainer: foo
    - name: nutanix
      strategy: HelmAddon
      storageClassConfigs:
      - name: fancy
        parameters:
          key: value

Results in two StorageClasses:

  1. local-path-basic
  2. nutanix-fancy

CAREN can validate that the derived name (a combination of provider name and StorageClassConfig name) is a valid Kubernetes resource name.

@dlipovetsky
Copy link
Contributor

Also with our current default cluster classes storage class configuration we would have storage classes created such as aws-ebs-aws-ebs or nutanix-nutanix-volume which feels a bit odd to me.

We can easily fix this stutter by renaming the storage class configurations.

@jimmidyson
Copy link
Member Author

CAREN can validate that the derived name (a combination of provider name and StorageClassConfig name) is a valid Kubernetes resource name.

And that the name is unique across all storage classes, the same as this pr currently does, but could move validation to ensure unique name for storage class configs under the same provider (definitely possible).

The proposed derivation will ensure uniqueness as long as we do not have potential for collision by virtue of the simple string concatenation, which with the current supported providers is not a problem right now.

@dlipovetsky
Copy link
Contributor

dlipovetsky commented Jun 12, 2024

I'm ok to merge this PR.

Once it merges, do we agree to follow up with two more PRs

  1. Derive the StorageClass resource name from the CSI provider and StorageClassConfig names (@faiq's suggestion)
    2. Move defaultStorageClassConfigName inside csi and rename it to defaultStorageClassConfig. (@dkoshkin's suggestion) [Update: My mistake--it is already under csi. I don't think it needs to move under providers.]

@jimmidyson
Copy link
Member Author

@dlipovetsky

Move defaultStorageClassConfigName inside csi

It is already inside CSI 🤔

@jimmidyson
Copy link
Member Author

Been thinking on this more and how to solve the potential name collision issue. One of the reasons for this is because CAPI does not support x-kubernetes-list-type which in CRDs could be set to map and use name as the map key, ensuring uniqueness (at least a unique storage class name per provider).

However, CAPI does support map type properties so we could use that to specify storageclasses as:

csi:
  defaultStorageClassConfig: nutanix-fancy
  providers:
    - name: local-path
      strategy: HelmAddon
      storageClassConfigs:
	    basic:
	      parameters:
	        storageContainer: foo
    - name: nutanix
      strategy: HelmAddon
      storageClassConfigs:
        fancy:
          parameters:
            key: value

This way we wouldn't need to validate the name uniqueness even within a provider's specified storageclassconfigs.

@dlipovetsky
Copy link
Contributor

Been thinking on this more and how to solve the potential name collision issue. One of the reasons for this is because CAPI does not support x-kubernetes-list-type which in CRDs could be set to map and use name as the map key, ensuring uniqueness (at least a unique storage class name per provider).

However, CAPI does support map type properties so we could use that to specify storageclasses as:

csi:
  defaultStorageClassConfig: nutanix-fancy
  providers:
    - name: local-path
      strategy: HelmAddon
      storageClassConfigs:
	    basic:
	      parameters:
	        storageContainer: foo
    - name: nutanix
      strategy: HelmAddon
      storageClassConfigs:
        fancy:
          parameters:
            key: value

This way we wouldn't need to validate the name uniqueness even within a provider's specified storageclassconfigs.

I like this!

If we use a map for storageClassConfigs, let's consider using one for providers, too. (Looking at the API, these are the only two types that use a list where elements must be unique.)

Here's the above example, changed to use a map for providers:

csi:
  defaultStorageClassConfig: nutanix-fancy
  providers:
    local-path:
      strategy: HelmAddon
      storageClassConfigs:
      basic:
        parameters:
          storageContainer: foo
    nutanix:
      strategy: HelmAddon
      storageClassConfigs:
        fancy:
          parameters:
            key: value

@jimmidyson
Copy link
Member Author

If we use a map for storageClassConfigs, let's consider using one for providers, too

Love it! I'll update this PR with those API changes.

@faiq
Copy link
Contributor

faiq commented Jun 13, 2024

the design seems much better great work hashing it out !

@jimmidyson
Copy link
Member Author

Looking at implementing the proposed API structure above, but hit a problem that we have previously restricted the set of CSI providers for each infra provider (e.g. only allow aws-ebs for CAPA clusters). To use a map would require additionalProperties in the OpenAPI schema, and unfortunately there is no way to restrict the set of property names in this case (OpenAPI allows propertyNames - effectively an enum of property names allowed in additionalProperties - for this but that is not supported in CAPI or upstream CRD validation that CAPI validation is based on).

I think we'll have to drop back to providers being a slice with name property that we can restrict as we're currently doing per provider. Any ideas? 🙏

So back to:

csi:
  defaultStorageClassConfig: nutanix-fancy
  providers:
    - name: local-path
      strategy: HelmAddon
      storageClassConfigs:
	    basic:
	      parameters:
	        storageContainer: foo
    - name: nutanix
      strategy: HelmAddon
      storageClassConfigs:
        fancy:
          parameters:
            key: value

Storage class config names can use additionalProperties with arbitrary keys still (so moving to a map for that).

@dlipovetsky
Copy link
Contributor

Will we be able to validate property names in the future, e.g. using CEL (kubernetes-sigs/cluster-api#9239)?

If so, I would prefer to change the API now, and add validation later.

I don't worry about users choosing an invalid CSI provider. But I do worry about changing the API later.

@jimmidyson jimmidyson force-pushed the jimmi/remove-csi-defaultstorageprovidername branch from 0a22ad5 to daa818e Compare June 17, 2024 16:20
@jimmidyson jimmidyson changed the title fix: Remove providerName from defaultStorage config refactor: Use maps for CSI providers and storage classes Jun 17, 2024
@jimmidyson
Copy link
Member Author

@dlipovetsky @faiq I've updated the API as discussed above. It became a little more involved by now having a per-infra provider CSI configuration, but with few changes in handler code by using the internal variables API as we do in other handlers. This allows us to have a strict API (using maps to avoid name collisions and only allowing supported CSI providers per infra provider) but also have generic handler code.

I have updated the PR description and title with relevant details so you can see the API changes this introduces.

This commit switches from arrays to maps for the CSI providers and
storage class configurations. This ensures unique naming and therefore
prevents name collisions.

The infra provider specific cluster configuration now includes specific
CSI configuration to ensure only compatible and supported CSI providers
are configurable for each infra provider.

Finally, this commit changes the final storage class that is created on
the workload cluster to be derived from the CSI provider name and
storage class config name, e.g. `<provider>-<storageClassConfig>`, once
again to avoid name collisions.

Using the API in this way reduces the chances of conflicting
configurations when a user and provides early feedback when creating the
cluster that the configurations incorrect via OpenAPI validation.

As an example, this commit updates the cluster configuration from:

```
defaultStorage:
  providerName: local-path
  storageClassConfigName: local-path
providers:
  - name: local-path
    storageClassConfig:
      - name: local-path
    strategy: HelmAddon
```

To:

```
defaultStorage:
  provider: local-path
  storageClassConfig: default
providers:
  local-path:
    storageClassConfigs:
      default: {}
    strategy: HelmAddon
```

While this may seem like only a small change, a single line removed, it
is always good to keep APIs as terse as possible while still retaining
strict behaviour. I think this commit achieves that.
@jimmidyson jimmidyson force-pushed the jimmi/remove-csi-defaultstorageprovidername branch from daa818e to aeef41c Compare June 18, 2024 11:17
Copy link
Contributor

@dlipovetsky dlipovetsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great improvement, both in terms of API, and implementation ergonomics! Thanks for patiently considering all our feedback.

@jimmidyson jimmidyson merged commit 2183f8f into main Jun 18, 2024
17 checks passed
@jimmidyson jimmidyson deleted the jimmi/remove-csi-defaultstorageprovidername branch June 18, 2024 18:21
@github-actions github-actions bot mentioned this pull request Jun 18, 2024
faiq pushed a commit that referenced this pull request Jun 24, 2024
🤖 I have created a release *beep* *boop*
---


## 0.10.0 (2024-06-24)

<!-- Release notes generated using configuration in .github/release.yaml
at main -->

## What's Changed
### Exciting New Features 🎉
* feat: Upgrade to Cilium v1.15.5 by @jimmidyson in
#689
* feat: Upgrade to Calico v3.28.0 by @jimmidyson in
#688
* feat: bumps caaph to v0.2.3 by @faiq in
#691
* feat: Add local-path-provisioner CSI by @jimmidyson in
#693
* feat: cluster-api v1.7.3 by @jimmidyson in
#714
* feat: bumps caaph to 0.2.4 by @faiq in
#718
* feat: Controller that copies ClusterClasses to namespaces by
@dlipovetsky in
#715
* feat: adds a mindthegap container and deployment by @faiq in
#637
* feat: implements BeforeClusterUpgrade hook by @faiq in
#682
### Fixes 🔧
* fix: use external Nutanix API types directly by @dkoshkin in
#698
* fix: Post-process clusterconfig CRDs for supported CSI providers by
@jimmidyson in
#695
* fix: nutanix credentials Secrets owner refs by @dkoshkin in
#711
* fix: credential provider response secret ownership by @dkoshkin in
#709
* fix: static credentials Secret generation by @dkoshkin in
#717
* fix: set ownerReference on imageRegistry and globalMirror Secrets by
@dkoshkin in
#720
* fix: Allow Nutanix CSI snapshot controller & webhook to run on CP
nodes by @dlipovetsky in
#723
* refactor: Use maps for CSI providers and storage classes by
@jimmidyson in
#696
* fix: CredentialProviderConfig matchImages to support registries with
port by @dkoshkin in
#724
* fix: Allow Node Feature Discovery garbage collector to run on
control-plane nodes by @dlipovetsky in
#722
* fix: RBAC role for namespace-sync controller to watch,list namespaces
by @dkoshkin in
#738
* fix: image registries not handling CA certificates by @dkoshkin in
#729
* fix: adds a docker buildx step before release-snapshot by @faiq in
#741
### Other Changes
* docs: Add released version to helm and clusterctl install by
@jimmidyson in
#683
* revert: Temporary lint config fix until next golangci-lint release
(#629) by @jimmidyson in
#686
* refactor: Delete unused code by @jimmidyson in
#687
* refactor: Reduce log verbosity for skipped handlers by @jimmidyson in
#692
* build: update Go to 1.22.4 by @dkoshkin in
#700
* build(deps): Upgrade CAPX version to v1.4.0 by @thunderboltsid in
#707
* build: Move CSI supported provider logic to script by @jimmidyson in
#703
* build: Add testifylint linter by @jimmidyson in
#706
* build: Update all tools by @jimmidyson in
#704
* refactor: rename credential provider response secret by @dkoshkin in
#710
* refactor: Simplify code by using slices.Clone by @jimmidyson in
#712
* refactor: consistently use the same SetOwnerReference function by
@dkoshkin in
#713
* refactor: kube-vip commands by @dkoshkin in
#699
* build: Fix an incorrect make variable passed to goreleaser by
@dlipovetsky in
#716
* build: Add 'chart-docs' make target by @dlipovetsky in
#727
* build: Make CAREN mindthegap reg multiarch by @jimmidyson in
#730
* Add helm values schema plugin by @dlipovetsky in
#728
* test(e2e): Use mesosphere fork with CRSBinding fix by @jimmidyson in
#736

## New Contributors
* @thunderboltsid made their first contribution in
#707

**Full Changelog**:
v0.9.0...v0.10.0

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants