Skip to content

🌱 E2e: Use pre-build node images #1699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lentzi90
Copy link
Contributor

@lentzi90 lentzi90 commented Oct 2, 2023

What this PR does / why we need it:

We are currently using a plain ubuntu cloud image for the nodes in most e2e tests. Cloud-init scripts are used to install the Kubernetes components on startup. This is convenient since we do not have to build specific images. However, the scripts are not well maintained, they increase the complexity and running time of the tests.

This changes the tests to use a pre-built image. The image was build using image-builder, which is a standard for many CAPI providers.

As a consequence, the containerd patches are no longer needed. (In fact they probably are not needed anyway since the only reason we needed them was to get a newer version of containerd that was not available in the ubuntu repos at the time.)

A bit more on why I think we need this:
The script for injecting CI artifacts has not proven to be very reliable. We have had a number of issues caused by us relying on them. See for example #1429 and #1458. We have disabled the k8s upgrade test because the use of CI artifacts makes it hard/impossible to run it. I also expect that we will run into issues the next time we try to bump the Kubernetes version because of the change to package repositories. The CI injection script has not been updated to handle the new repositories even though the old repositories are already frozen.
It seems like we are quite alone using this script and need to decide if we take ownership of it or just start building images.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • if necessary:
    • includes documentation
    • adds unit tests

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 2, 2023
@netlify
Copy link

netlify bot commented Oct 2, 2023

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit 4fb08c7
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/652671aae56ecb0008c35fdb
😎 Deploy Preview https://deploy-preview-1699--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 2, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lentzi90

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 2, 2023
@lentzi90 lentzi90 force-pushed the lentzi90/e2e-pre-built-node-images branch from 3868a82 to 6369da6 Compare October 2, 2023 12:05
Comment on lines +61 to +62
templatesDir := path.Join(e2eCtx.Settings.ArtifactFolder, "templates")
Expect(os.MkdirAll(templatesDir, 0o750)).To(Succeed(), "Can't create templates folder %q", templatesDir)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This folder is created by the GenerateCIArtifactsInjectedTemplateForDebian function that is called for templates that should get CI artifacts injected. If there are no such templates, we still need to ensure that the folder exists since the templates without CI artifacts will also be written to it.

@lentzi90
Copy link
Contributor Author

lentzi90 commented Oct 2, 2023

Unsure if flake or actual issue... 5 tests passed, 1 failed and 4 skipped.
/test pull-cluster-api-provider-openstack-e2e-test

@lentzi90 lentzi90 force-pushed the lentzi90/e2e-pre-built-node-images branch from 6369da6 to 2552e45 Compare October 3, 2023 06:20
@lentzi90
Copy link
Contributor Author

lentzi90 commented Oct 3, 2023

Looks like this shaves off about 7-8 minutes from the tests. Not much but always something 🙂

@lentzi90
Copy link
Contributor Author

lentzi90 commented Oct 3, 2023

/test pull-cluster-api-provider-openstack-e2e-full-test

@lentzi90
Copy link
Contributor Author

lentzi90 commented Oct 3, 2023

/cc @mdbooth

@k8s-ci-robot k8s-ci-robot requested a review from mdbooth October 3, 2023 12:08
Copy link
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Is anything still using the templates with artifacts? If not, could we remove what we're not using?

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 4, 2023
@lentzi90 lentzi90 force-pushed the lentzi90/e2e-pre-built-node-images branch from 2552e45 to 2f8bc92 Compare October 4, 2023 11:36
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 4, 2023
@lentzi90 lentzi90 requested a review from mdbooth October 5, 2023 05:21
@mdbooth
Copy link
Contributor

mdbooth commented Oct 9, 2023

@lentzi90 I completely agree that this approach is the best way to go. However, remember that we originally did this because we weren't doing a good job of maintaining the images. So the maintenance problem persists, we'd just be moving it from the e2e templates to an image builder pipeline that we don't have yet.

Which is the lesser evil? If we want to merge this, what do you think the chances are of getting the image builder pipeline effort going again?

@lentzi90
Copy link
Contributor Author

lentzi90 commented Oct 9, 2023

@lentzi90 I completely agree that this approach is the best way to go. However, remember that we originally did this because we weren't doing a good job of maintaining the images. So the maintenance problem persists, we'd just be moving it from the e2e templates to an image builder pipeline that we don't have yet.

Which is the lesser evil? If we want to merge this, what do you think the chances are of getting the image builder pipeline effort going again?

That is a good point.
I would honestly rather work on the pipeline than trying to patch together the on-the-fly building. It has been painful trying to debug the failures that happen in cloud-init of the workload cluster nodes. 🙁
The image-builder works perfectly if I just run it locally, and that is what I have done for this PR. I understand that it is not ideal to rely on me or other maintainers to do this manually though. However, I find it slightly better to be able to build those images manually and then it "just works", instead of using plain cloud images but having to fight the pre-kubeadm scripts.

Copy link
Contributor

@wwentland wwentland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, with the downside of having to manually maintain/build/upload newer versions of the images in use here.

IMAGE_URLS="https://storage.googleapis.com/artifacts.k8s-staging-capi-openstack.appspot.com/test/amphora/2022-12-05/amphora-x64-haproxy.qcow2,"
IMAGE_URLS+="https://storage.googleapis.com/artifacts.k8s-staging-capi-openstack.appspot.com/test/cirros/2022-12-05/cirros-0.6.1-x86_64-disk.img,"
IMAGE_URLS+="https://storage.googleapis.com/artifacts.k8s-staging-capi-openstack.appspot.com/test/ubuntu/2023-09-29/ubuntu-2204-kube-v1.27.2.img,"
IMAGE_URLS+="https://storage.googleapis.com/artifacts.k8s-staging-capi-openstack.appspot.com/test/flatcar/flatcar-stable-3510.2.4-kube-v1.27.2.img"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdbooth kindly uploaded a new version, so we might want to probably use it here:

  • https://storage.googleapis.com/artifacts.k8s-staging-capi-openstack.appspot.com/test/flatcar/flatcar-stable-3602.2.0-kube-v1.27.2.img

This would get us all the security fixes in that (and interim) release(s):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I have updated it 🙂

We are currently using a plain ubuntu cloud image for the nodes in most
e2e tests. Cloud-init scripts are used to install the Kubernetes
components on startup. This is convenient since we do not have to build
specific images. However, the scripts are not well maintained, they
increase the complexity and running time of the tests.

This changes the tests to use a pre-built image. The image was build
using image-builder, which is a standard for many CAPI providers.
@lentzi90 lentzi90 force-pushed the lentzi90/e2e-pre-built-node-images branch from 2f8bc92 to 4fb08c7 Compare October 11, 2023 09:58
@mdbooth
Copy link
Contributor

mdbooth commented Oct 12, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 12, 2023
@lentzi90
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 12, 2023
@k8s-ci-robot k8s-ci-robot merged commit d4baeb1 into kubernetes-sigs:main Oct 12, 2023
@lentzi90 lentzi90 deleted the lentzi90/e2e-pre-built-node-images branch October 12, 2023 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants