Skip to content

⚠️ Allow explicitly empty volume AZ #2008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 12, 2024

Conversation

mdbooth
Copy link
Contributor

@mdbooth mdbooth commented Apr 10, 2024

Replaces an AvailabilityZone string for volumes with a VolumeAvailabilityZone struct which allows more flexibility in defaulting behaviour. Specifically it enables us to express both the current default behaviour where we take the volume AZ from the Machine, and a new default behaviour where to don't specify a volume AZ at all.

In making this change to both RootVolume and AdditionalBlockDevices we use common code for both APIs. This has the result of updating RootVolume to be consistent with a volume specified via AdditionalBlockDevices.

Fixes #1649

/hold

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 10, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 10, 2024
Copy link

netlify bot commented Apr 10, 2024

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit 9aa449c
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/661818600df7c600087335c8
😎 Deploy Preview https://deploy-preview-2008--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Comment on lines 548 to 557
// From specifies where we will obtain the availability zone for the
// volume. The options are "Name" and "Machine". If "Name" is specified
// then the Name field must also be specified. If "Machine" is specified
// the volume will use the value of FailureDomain, if any, from the
// associated Machine.
// +kubebuilder:default:=Name
// +optional
AvailabilityZone string `json:"availabilityZone,omitempty"`
From VolumeAZSource `json:"from,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a bit like an union to me? We probably want validation that if from=Machine you cannot set Name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's absolutely a discriminated union, although one with a hopefully ergonomic optional default.

The latest version has validation and a complete suite of tests.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 10, 2024
@mdbooth mdbooth marked this pull request as ready for review April 10, 2024 19:59
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 10, 2024
@k8s-ci-robot k8s-ci-robot requested a review from dulek April 10, 2024 19:59
@mdbooth mdbooth force-pushed the volumeaz branch 2 times, most recently from 69ab5e1 to 03999c3 Compare April 11, 2024 08:13
@mdbooth
Copy link
Contributor Author

mdbooth commented Apr 11, 2024

That test failure confirms that the new default behaviour works :) I've updated the e2e test to specify the from: Machine behaviour.

@mdbooth
Copy link
Contributor Author

mdbooth commented Apr 11, 2024

/test pull-cluster-api-provider-openstack-e2e-full-test

@mdbooth
Copy link
Contributor Author

mdbooth commented Apr 11, 2024

Running the e2e tests again because there's some non-determinism in there: if we weren't correctly setting the target AZ, they might land in the right place anyway by accident.

/test pull-cluster-api-provider-openstack-test

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 11, 2024
@mdbooth
Copy link
Contributor Author

mdbooth commented Apr 11, 2024

The merge conflict is only in instance_test.go. The e2e-full-test passed before it was detected, so I won't bother running it again.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 11, 2024
@mdbooth
Copy link
Contributor Author

mdbooth commented Apr 11, 2024

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 11, 2024
@mdbooth mdbooth added this to the v0.10.0 milestone Apr 11, 2024
Copy link
Contributor

@dulek dulek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question.

Comment on lines +117 to +137
/* FIXME: These tests are failing
It("should not allow additional volume with empty name", func() {
machine.Spec.AdditionalBlockDevices = []infrav1.AdditionalBlockDevice{
{
Name: "",
SizeGiB: 1,
},
}
Expect(k8sClient.Create(ctx, machine)).NotTo(Succeed(), "Creating a machine with an empty name additional block device should fail")
})

It("should not allow additional volume with name root", func() {
machine.Spec.AdditionalBlockDevices = []infrav1.AdditionalBlockDevice{
{
Name: "root",
SizeGiB: 1,
},
}
Expect(k8sClient.Create(ctx, machine)).NotTo(Succeed(), "Creating a machine with a root named additional block device should fail")
})
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We keep this on FIXME?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yeah. Should probably create an issue to track it with a pointer to the FIXME code, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: #2013

Replaces an AvailabilityZone string for volumes with a
VolumeAvailabilityZone struct which allows more flexibility in
defaulting behaviour. Specifically it enables us to express both the
current default behaviour where we take the volume AZ from the Machine,
and a new default behaviour where to don't specify a volume AZ at all.

In making this change to both RootVolume and AdditionalBlockDevices we
use common code for both APIs. This has the result of updating
RootVolume to be consistent with AdditionalBlockDevices.
@mdbooth
Copy link
Contributor Author

mdbooth commented Apr 11, 2024

/test pull-cluster-api-provider-openstack-e2e-full-test

Copy link
Contributor

@stephenfin stephenfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only commenting on the API (though the additional test coverage looks good and is appreciated). As noted inline, I don't love the idea of perpetuating this myth that Compute and Block Storage AZs can be considered equivalent. However, it's a widely spread myth that clearly some users rely on so 🤷

/lgtm

// if From is "Name". The volume availability zone name may not contain
// spaces.
// +optional
Name *VolumeAZName `json:"name,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't put my finger on why, but this feels clunky. I had envisioned the "inherit from compute AZ" behaviour to be either (a) an anti-feature or (b) if actually required something that would be configured either on a deployment- or machine-wide basis. I've no real reason for this though, other than a general dislike of the whole thing, nor do I have an alternate proposal. As such I'm mainly leaving this note here so that I said something.

Copy link
Contributor Author

@mdbooth mdbooth Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't preclude adding a machine or cluster-wide default override in the future. However, like you I'm not expecting the from: Machine behaviour to be widely used. This is also why I made the discriminator (the From field) optional with an explicit default. This means that the anticipated use case goes from:

availabilityZone: foo

to

availabilityZone:
  name: foo

which isn't awful, and leaves us with a get-out-of-jail-free card to fix it later without breaking backwards compatibility when we understand it better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the situation gets better now, when the default behavior becomes set no AZ at all.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 11, 2024
@huxcrux
Copy link
Contributor

huxcrux commented Apr 11, 2024

/lgtm

Awesome that size on discs is specified in the same way for root and extra disks :)

@mdbooth
Copy link
Contributor Author

mdbooth commented Apr 12, 2024

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mdbooth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 12, 2024
@k8s-ci-robot
Copy link
Contributor

@mdbooth: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-openstack-e2e-test 9aa449c link unknown /test pull-cluster-api-provider-openstack-e2e-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot merged commit 8c7e661 into kubernetes-sigs:main Apr 12, 2024
9 of 10 checks passed
@mdbooth mdbooth deleted the volumeaz branch April 12, 2024 09:14
@mdbooth
Copy link
Contributor Author

mdbooth commented Apr 12, 2024

@mdbooth: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-openstack-e2e-test 9aa449c link unknown /test pull-cluster-api-provider-openstack-e2e-test
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

@lentzi90 This is worrying. This failure was a flake, and I'm reasonably confident we haven't broken it. However... I'm surprised it would merge a PR with a test failure. I'm assuming this additional test run is against the proposed merge commit, so I get why it runs. But why run it if it's going to merge anyway on failure?

@lentzi90
Copy link
Contributor

Oh yes this looks bad. I have no explanation for it. Could there be some higher level config in test-infra that runs it as a post-submit? I remember we removed our post-submits from here, but perhaps there is some config telling prow to rerun after merge? 🤔
The job is definitely marked as required so I don't see how it could merge except by some bug in prow

@liggitt
Copy link

liggitt commented Apr 13, 2024

https://prow.k8s.io/pr-history/?org=kubernetes-sigs&repo=cluster-api-provider-openstack&pr=2008 shows a pass of that test at that commit.

It got tested and merged in a batch merge

IMG_4725

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Compute AZs should not be used as Volume AZs
7 participants