Skip to content

Update CI test configuration #551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2020

Conversation

jingxu97
Copy link
Contributor

@jingxu97 jingxu97 commented Jul 14, 2020

update ci test configuration

Fix windows integration script

remove env virable GCE_PD_DO_DRIVER_BUILD direct setting

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 14, 2020
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 14, 2020
@jingxu97
Copy link
Contributor Author

cc @msau42 @verult

@jingxu97
Copy link
Contributor Author

replace #549

err = runCommand("Checking driver pods", statusCmd)
if err != nil {
return fmt.Errorf("failed to check driver pods: %v", err)
}

time.Sleep(time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this second check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I was thinking to keep "default" and add "gce-pd-csi-driver" namespace one.
I am not sure why it was checking "default" before, or just a mistake.

Also this runCommand actually does not really check pod status. It will not return error if pods are not successfully running. In the second check I added, I print out the output for debugging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we changed namespaces, we forgot to update the test to check in the correct namespace. So we can remove the check for default namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. I plan to have follow up PR to check pod status and print out debug log etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @mattcary is going to look into it as part of #139

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this repo switched to use a Deployment instead of a StatefulSet, so importing it is not going to work as is.

Maybe we can switch k/k to use Deployment too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattcary so far your pr looks good to me. In case some pod not running correctly, do we collect logs on events which help debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never mind, I think logs are collected #140

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, switched to deployment and moved the driver name to the exported function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there already is a deployment-waiter in framework/deployment/wait that could be used. That's what my change is using now, so there's nothing to export to you.

}
nodes := strings.Fields(string(out))
for _, node := range nodes {
taintCmd := exec.Command("kubectl", "taint", "node", node, "node.kubernetes.io/os:NoSchedule-")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required if we're not testing mixed os clusters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only needed for windows clusters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should kubernetes-e2e.py be handling this as part of windows cluster bringup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it is not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why it shouldn't be part of standard cluster bringup tool? How come the end users need to do this manually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check with windows team to see whether we could remove the taint in bringup tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because there are linux pods scheduled on windows cluster, so avoid pods scheduled to windows node, we currently taint windows node. We could add node selector to pods, but I think not all pods have it. So taint windows node is the easiest way for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question is why doesn't the cluster installer that brings up the windows cluster do it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I see this is removing the taint that got added. So in order to run e2e tests, we need to remove the taint so the e2e pods will run on it. But is there a reason why kubernetes_e2e.py which is a e2e boostrapping script can't do it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, let's think about if we should enhance kubernetes_e2e in the future so that all test runners don't have to do this. For now, can you add a comment explaining why we need to remove the taint?

@@ -22,7 +21,7 @@ readonly teardown_driver=${GCE_CLUSTER_ZONE:-true}
make -C ${PKGDIR} test-k8s-integration

base_cmd="${PKGDIR}/bin/k8s-integration-test \
--platform=windows --local-k8s-dir=${LOCAL_K8S_DIR} --bringup-cluster=false --teardown-cluster=false --teardown-driver=${teardown_driver}\
--platform=windows --local-k8s-dir=${k8s_dir} --bringup-cluster=false --teardown-cluster=false --teardown-driver=${teardown_driver}\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local k8s dir is only for if you're running the test from your own local machine. Do we need it for CI? k8s-integration should download kubernetes and build in a tmp directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in prow test using kubetest, kuberentes will be installed under current dir which is the pd csi driver repo. in the test, kubetest needs to be running in kubernetes dir. We don't need to download kuberentes locally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubetest should be run from testDir, not k8sDir: https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/test/k8s-integration/main.go#L221

Maybe we need to set --test-version here.

Btw, does kubernetes_e2e support gke clusters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there are some confusion here. For current pd driver ci tests for linux, we build kubernetes and install it under a tmp dir before running the test. But this step is not needed for the windows test setup. It uses script /workspace/scenarios/kubernetes_e2e.py which will install kuberentes like kube_up, I think. https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/gcp-compute-persistent-disk-csi-driver-windows.yaml#L65

This script will install kuberentes under current dir which is pd csi driver repo. The local-k8s-dir setting here does not mean the test will install kubernetes locally. I used this setting because the test use it change current dir here https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/test/k8s-integration/main.go#L421

So because I didn't set kubeversion or testversion, so the test code will not install kubernetes locally because it is not needed.

I plan to have another PR to edit the config parameter so that it is easier to read. Also I think for linux test, we can also use /workspace/scenarios/kubernetes_e2e.py to setup kubernetes instead of building it locally. We can set to use ci/latest or ci/master depending which kubernetes version we want to test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should set "--test-version" instead. The reason is because we may use different Kubernetes versions to 1) bring up the cluster and 2) run e2e binary

We generally use "master" as the e2e version so that we can get new test cases and test fixes without relying on them to be backported to older Kubernetes versions.

"--local-k8s-dir" is intended for developer testing where you just want to run the e2es quickly without having to download and build kubernetes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is since we already have prebuild kubernetes in different versions https://github.com/kubernetes/test-infra/blob/a4b7e151278a26fb45ee186b7fdc96394deed016/docs/kubernetes-versions.md
we could just use it by setting extract=ci/k8s-beta , ci/k8s-master , ... etc.
It will save k8s build time. Is there some specific version we want to test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think in release versions we can save a lot of time by using extract. But if we want to test head, we still need to build. We do have https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/test/k8s-integration/main.go#L40 that I think can help with this.

But still, we have 3 distinct variables we use to distinguish different use cases:

  • kubeVersion to set the cluster version
  • testVersion to set the e2e version
  • local for local development (you're making a change in kubernetes or the e2es that you want to test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synced up offline. Will remove the need to locally download and build e2e binary in a separate PR. Instead we could set extract=ci/k8s-version.

readonly PKGDIR=${GOPATH}/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver
readonly LOCAL_K8S_DIR=${GOPATH}/src/k8s.io/kubernetes
readonly k8s_dir=${PKGDIR}/kubernetes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PKGDIR points to the PD CSI driver repo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, in prow test, kuberentes will be installed under current dir which is the pd csi driver repo. in the test, kubetest needs to be in kubernetes dir.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this variable anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it. thanks!

@@ -22,7 +21,7 @@ readonly teardown_driver=${GCE_CLUSTER_ZONE:-true}
make -C ${PKGDIR} test-k8s-integration

base_cmd="${PKGDIR}/bin/k8s-integration-test \
--platform=windows --local-k8s-dir=${LOCAL_K8S_DIR} --bringup-cluster=false --teardown-cluster=false --teardown-driver=${teardown_driver}\
--platform=windows --local-k8s-dir=${k8s_dir} --bringup-cluster=false --teardown-cluster=false --teardown-driver=${teardown_driver}\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teardown_driver references GCE_CLUSTER_ZONE. Is that intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. Thanks1

readonly PKGDIR=${GOPATH}/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver
readonly LOCAL_K8S_DIR=${GOPATH}/src/k8s.io/kubernetes
readonly k8s_dir=${PKGDIR}/kubernetes
readonly overlay_name="${GCE_PD_OVERLAY_NAME:-alpha}"
readonly do_driver_build="${GCE_PD_DO_DRIVER_BUILD:-true}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DO_DRIVER_BUILD no longer exists

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see DO_DRIVER_BUILD? It has GCE_PD_DO_DRIVER_BUILD?

@jingxu97
Copy link
Contributor Author

comments were addressed. PTAL.

}

klog.V(2).Infof("GCE PD driver pods: \n %v", string(out))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this is fixed in #140

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

}
nodes := strings.Fields(string(out))
for _, node := range nodes {
taintCmd := exec.Command("kubectl", "taint", "node", node, "node.kubernetes.io/os:NoSchedule-")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should kubernetes-e2e.py be handling this as part of windows cluster bringup?

readonly PKGDIR=${GOPATH}/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver
readonly LOCAL_K8S_DIR=${GOPATH}/src/k8s.io/kubernetes
readonly k8s_dir=${PKGDIR}/kubernetes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this variable anymore?

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 16, 2020
@jingxu97
Copy link
Contributor Author

/test pull-gcp-compute-persistent-disk-csi-driver-e2e-windows

@jingxu97
Copy link
Contributor Author

/retest

@jingxu97
Copy link
Contributor Author

@verult @msau42 PTAL.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 16, 2020
@jingxu97
Copy link
Contributor Author

/retest

@@ -71,9 +72,9 @@ func installDriver(goPath, pkgDir, stagingImage, stagingVersion, deployOverlayNa
statusCmd := exec.Command("kubectl", "describe", "pods", "-n", driverNamespace)
err = runCommand("Checking driver pods", statusCmd)
if err != nil {
return fmt.Errorf("failed to check driver pods: %v", err)
return fmt.Errorf("failed to get nodes: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error message is not correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

}

klog.V(2).Infof("GCE PD driver pods: \n %v", string(out))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out is no longer relevant in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. PTAL

@@ -71,9 +71,8 @@ func installDriver(goPath, pkgDir, stagingImage, stagingVersion, deployOverlayNa
statusCmd := exec.Command("kubectl", "describe", "pods", "-n", driverNamespace)
err = runCommand("Checking driver pods", statusCmd)
if err != nil {
return fmt.Errorf("failed to check driver pods: %v", err)
return fmt.Errorf("failed to describe nodes: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be describe pods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, not sure what I was thinking. fixed.

update ci test configuration

Fix windows integration script

remove env virable GCE_PD_DO_DRIVER_BUILD direct setting
@jingxu97
Copy link
Contributor Author

/test pull-gcp-compute-persistent-disk-csi-driver-e2e-windows

@msau42
Copy link
Contributor

msau42 commented Jul 17, 2020

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 17, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jingxu97, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 17, 2020
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jul 17, 2020

@jingxu97: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-gcp-compute-persistent-disk-csi-driver-e2e-windows 1b4d17f link /test pull-gcp-compute-persistent-disk-csi-driver-e2e-windows

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jingxu97
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit 03f0690 into kubernetes-sigs:master Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants