Skip to content

All Windows e2e periodic tests are failing #880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mauriciopoppe opened this issue Dec 22, 2021 · 10 comments
Closed

All Windows e2e periodic tests are failing #880

mauriciopoppe opened this issue Dec 22, 2021 · 10 comments

Comments

@mauriciopoppe
Copy link
Member

All the tests are failing for all the Windows versions in https://testgrid.k8s.io/provider-gcp-compute-persistent-disk-csi-driver#ci-win2019-provider-gcp-compute-persistent-disk-csi-driver

From the logs I see that the Windows nodes couldn't join the Kubernetes cluster.

I've also been trying to set up a k/k cluster with Windows nodes in local volume provisioner in this PR kubernetes-sigs/sig-storage-local-static-provisioner#279 which fails with the same error (e.g. nodes not joining the cluster), I couldn't reproduce this locally so I believe it might be something in the GCP project chosen by boskos where the Windows nodepool is created.

Logs from a successful node join:

2021/12/21 22:06:14 GCEMetadataScripts: windows-startup-script-ps1: Kubernetes components started successfully
2021/12/21 22:06:14 GCEMetadataScripts: windows-startup-script-ps1: Waiting 15 seconds for node to join cluster.
2021/12/21 22:06:31 GCEMetadataScripts: windows-startup-script-ps1: kubectl get nodes:
2021/12/21 22:06:31 GCEMetadataScripts: windows-startup-script-ps1: NAME                                             STATUS                     ROLES    AGE     VERSION
2021/12/21 22:06:32 GCEMetadataScripts: windows-startup-script-ps1: <control plane>                    Ready,SchedulingDisabled   <none>   8m8s    v1.23.0-alpha.4.579+e4952f32b79b69-dirty
2021/12/21 22:06:32 GCEMetadataScripts: windows-startup-script-ps1: <linux worker>         Ready                      <none>   6m55s   v1.23.0-alpha.4.579+e4952f32b79b69-dirty
2021/12/21 22:06:32 GCEMetadataScripts: windows-startup-script-ps1: <windows worker>  Ready                      <none>   28s     v1.23.0-alpha.4.579+e4952f32b79b69-dirty
2021/12/21 22:06:32 GCEMetadataScripts: windows-startup-script-ps1:
2021/12/21 22:06:32 GCEMetadataScripts: windows-startup-script-ps1: TODO: run more verification commands.

Logs from the project chosen for the LVP e2e test:

Kubernetes components started successfully
2021/12/22 18:38:16 GCEMetadataScripts: windows-startup-script-ps1: Waiting 15 seconds for node to join cluster.
2021/12/22 18:38:32 GCEMetadataScripts: windows-startup-script-ps1: kubectl get nodes:

# NOTE: no Windows nodes!
2021/12/22 18:38:32 GCEMetadataScripts: windows-startup-script-ps1: NAME                              STATUS                     ROLES    AGE     VERSION
2021/12/22 18:38:32 GCEMetadataScripts: windows-startup-script-ps1: <control plane>              Ready,SchedulingDisabled   <none>   8m26s   v1.24.0-alpha.1.216+c75d254beb662e
2021/12/22 18:38:32 GCEMetadataScripts: windows-startup-script-ps1: <linux worker>   Ready                      <none>   7m42s   v1.24.0-alpha.1.216+c75d254beb662e
2021/12/22 18:38:32 GCEMetadataScripts: windows-startup-script-ps1: <linux worker>   Ready                      <none>   7m42s   v1.24.0-alpha.1.216+c75d254beb662e

2021/12/22 18:38:32 GCEMetadataScripts: windows-startup-script-ps1:
2021/12/22 18:38:32 GCEMetadataScripts: windows-startup-script-ps1: TODO: run more verification commands.
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: Actions            : {MSFT_TaskExecAction}
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: Author             :
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: Date               :
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: Description        : Rotate Kubernetes logs
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: Documentation      :
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: Principal          : MSFT_TaskPrincipal2
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: SecurityDescriptor :
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: Settings           : MSFT_TaskSettings3
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: Source             :
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: State              : Ready
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: TaskName           : RotateKubeLogs
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: TaskPath           : \
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: Triggers           : {MSFT_TaskTimeTrigger}
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: URI                : \RotateKubeLogs
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: Version            :
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: PSComputerName     :
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1:
2021/12/22 18:38:34 GCEMetadataScripts: windows-startup-script-ps1: time="2021-12-22T18:38:34Z" level=warning msg="image connect using default endpoints: [npipe:////./pipe/dockershim npipe:////./pipe/containerd npipe:////./pipe/crio]. As the default settings are now deprecated, you should set the endpoint instead."
2021/12/22 18:38:36 GCEMetadataScripts: windows-startup-script-ps1: time="2021-12-22T18:38:36Z" level=error msg="connect endpoint 'npipe:////./pipe/dockershim', make sure you are running as root and the endpoint has been started: context deadline exceeded"
2021/12/22 18:38:38 GCEMetadataScripts: windows-startup-script-ps1: time="2021-12-22T18:38:38Z" level=error msg="connect endpoint 'npipe:////./pipe/containerd', make sure you are running as root and the endpoint has been started: context deadline exceeded"
2021/12/22 18:38:41 GCEMetadataScripts: windows-startup-script-ps1: time="2021-12-22T18:38:41Z" level=fatal msg="connect: connect endpoint 'npipe:////./pipe/crio', make sure you are running as root and the endpoint has been started: context deadline exceeded"
2021/12/22 18:38:41 GCEMetadataScripts: windows-startup-script-ps1: time="2021-12-22T18:38:41Z" level=warning msg="image connect using default endpoints: [npipe:////./pipe/dockershim npipe:////./pipe/containerd npipe:////./pipe/crio]. As the default settings are now deprecated, you should set the endpoint instead."
2021/12/22 18:38:43 GCEMetadataScripts: windows-startup-script-ps1: time="2021-12-22T18:38:43Z" level=error msg="connect endpoint 'npipe:////./pipe/dockershim', make sure you are running as root and the endpoint has been started: context deadline exceeded"
2021/12/22 18:38:45 GCEMetadataScripts: windows-startup-script-ps1: time="2021-12-22T18:38:45Z" level=error msg="connect endpoint 'npipe:////./pipe/containerd', make sure you are running as root and the endpoint has been started: context deadline exceeded"
2021/12/22 18:38:47 GCEMetadataScripts: windows-startup-script-ps1: time="2021-12-22T18:38:47Z" level=fatal msg="connect: connect endpoint 'npipe:////./pipe/crio', make sure you are running as root and the endpoint has been started: context deadline exceeded"
2021/12/22 18:38:47 GCEMetadataScripts: windows-startup-script-ps1: Exception caught in script:
2021/12/22 18:38:47 GCEMetadataScripts: windows-startup-script-ps1: At C:\k8s-node-setup.psm1:1358 char:7
2021/12/22 18:38:47 GCEMetadataScripts: windows-startup-script-ps1: +       throw "Error running 'crictl pull ${env:INFRA_CONTAINER}'"
2021/12/22 18:38:47 GCEMetadataScripts: windows-startup-script-ps1: +       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2021/12/22 18:38:47 GCEMetadataScripts: windows-startup-script-ps1: Kubernetes Windows node setup failed: Error running 'crictl pull k8s.gcr.io/pause:3.6'
2021/12/22 18:38:47 GCEMetadataScripts: windows-startup-script-ps1: Cleaning up, Unregistering WorkerServices...
2021/12/22 18:38:47 GCEMetadataScripts: windows-startup-script-ps1: [SC] DeleteService SUCCESS
2021/12/22 18:38:47 GCEMetadataScripts: windows-startup-script-ps1: [SC] DeleteService SUCCESS

cc @jingxu97

@mauriciopoppe
Copy link
Member Author

Kept on digging and found out that the kubelet didn't start in the Windows node:

2021/12/22 18:37:57 GCEMetadataScripts: windows-startup-script-ps1: Starting kubelet service
2021/12/22 18:37:59 GCEMetadataScripts: windows-startup-script-ps1:
2021/12/22 18:37:59 GCEMetadataScripts: windows-startup-script-ps1: SERVICE_NAME: kubelet
2021/12/22 18:37:59 GCEMetadataScripts: windows-startup-script-ps1:         TYPE               : 10  WIN32_OWN_PROCESS
2021/12/22 18:37:59 GCEMetadataScripts: windows-startup-script-ps1:         STATE              : 2  START_PENDING
2021/12/22 18:37:59 GCEMetadataScripts: windows-startup-script-ps1:                                 (NOT_STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN)
2021/12/22 18:37:59 GCEMetadataScripts: windows-startup-script-ps1:         WIN32_EXIT_CODE    : 0  (0x0)
2021/12/22 18:37:59 GCEMetadataScripts: windows-startup-script-ps1:         SERVICE_EXIT_CODE  : 0  (0x0)
2021/12/22 18:37:59 GCEMetadataScripts: windows-startup-script-ps1:         CHECKPOINT         : 0x0
2021/12/22 18:37:59 GCEMetadataScripts: windows-startup-script-ps1:         WAIT_HINT          : 0x7d0
2021/12/22 18:37:59 GCEMetadataScripts: windows-startup-script-ps1:         PID                : 4624
2021/12/22 18:37:59 GCEMetadataScripts: windows-startup-script-ps1:         FLAGS              :
2021/12/22 18:37:59 GCEMetadataScripts: windows-startup-script-ps1: Waiting 10 seconds for kubelet to stabilize
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1:
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1:
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: Name                : kubelet
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: RequiredServices    : {}
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: CanPauseAndContinue : False
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: CanShutdown         : False
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: CanStop             : False
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: DisplayName         : kubelet
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: DependentServices   : {}
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: MachineName         : .
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: ServiceName         : kubelet
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: ServicesDependedOn  : {}
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: ServiceHandle       : SafeServiceHandle
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: Status              : Stopped
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: ServiceType         : Win32OwnProcess
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: StartType           : Manual
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: Site                :
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1: Container           :
2021/12/22 18:38:09 GCEMetadataScripts: windows-startup-script-ps1:

The good logs show that after 10s the kubelet is running however in the logs above it's stopped.

@mauriciopoppe
Copy link
Member Author

Looking at the kubelet logs I saw:

E1222 19:29:34.822512    5056 server.go:286] "kubelet running with insufficient permissions" err="error while checking admin group membership: error retrieving group ids: The user name could not be found."
I1222 19:29:34.822512    5056 server.go:445] "Kubelet version" kubeletVersion="v1.24.0-alpha.1.216+c75d254beb662e"
I1222 19:29:34.822512    5056 init_windows.go:56] "Setting the priority of kubelet process" windowsPriorityClass="NORMAL_PRIORITY_CLASS"
I1222 19:29:34.823311    5056 service.go:58] Running kubelet as a Windows service!
I1222 19:29:34.823311    5056 feature_gate.go:245] feature gates: &{map[KubeletPodResources:false]}
I1222 19:29:34.824352    5056 feature_gate.go:245] feature gates: &{map[KubeletPodResources:false]}
W1222 19:29:34.824352    5056 plugins.go:132] WARNING: gce built-in cloud provider is now deprecated. The GCE provider is deprecated and will be removed in a future release. Please use https://github.com/kubernetes/cloud-provider-gcp
I1222 19:29:34.825608    5056 service.go:68] Service running
I1222 19:29:34.831568    5056 gce.go:933] Using existing Token Source &oauth2.reuseTokenSource{new:google.computeSource{account:"", scopes:[]string(nil)}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}
I1222 19:29:34.838541    5056 server.go:584] "Successfully initialized cloud provider" cloudProvider="gce" cloudConfigFile=""
I1222 19:29:34.838541    5056 server.go:1023] "Cloud provider determined current node" nodeName="<node name>"
I1222 19:29:34.838541    5056 bootstrap.go:119] "Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file"
I1222 19:29:34.840848    5056 bootstrap.go:150] "No valid private key and/or certificate found, reusing existing private key or creating a new one"
I1222 19:29:34.842801    5056 cert_rotation.go:137] Starting client certificate rotation controller
I1222 19:29:34.881059    5056 bootstrap.go:355] "Waiting for client certificate to be issued"
I1222 19:29:34.892403    5056 reflector.go:219] Starting reflector *v1.CertificateSigningRequest (0s) from k8s.io/client-go/tools/watch/informerwatcher.go:146
I1222 19:29:34.892403    5056 reflector.go:255] Listing and watching *v1.CertificateSigningRequest from k8s.io/client-go/tools/watch/informerwatcher.go:146
I1222 19:29:34.901468    5056 csr.go:262] certificate signing request node-csr-UEm2CE51nIx7XZrnqShz72y14RkCZvsYnqgFwJaezHM is approved, waiting to be issued
I1222 19:29:34.910055    5056 csr.go:258] certificate signing request node-csr-UEm2CE51nIx7XZrnqShz72y14RkCZvsYnqgFwJaezHM is issued
I1222 19:29:34.910055    5056 reflector.go:225] Stopping reflector *v1.CertificateSigningRequest (0s) from k8s.io/client-go/tools/watch/informerwatcher.go:146
I1222 19:29:34.932230    5056 cert_rotation.go:137] Starting client certificate rotation controller
I1222 19:29:34.934199    5056 dynamic_cafile_content.go:118] "Loaded a new CA Bundle and Verifier" name="client-ca-bundle::C:\\etc\\kubernetes\\pki\\ca-certificates.crt"
I1222 19:29:34.934199    5056 dynamic_cafile_content.go:156] "Starting controller" name="client-ca-bundle::C:\\etc\\kubernetes\\pki\\ca-certificates.crt"
I1222 19:29:35.653798    5056 server.go:490] "Sending events to api server"
I1222 19:29:35.965319    5056 fake_topology_manager.go:32] "NewFakeManager"
I1222 19:29:35.965319    5056 container_manager_windows.go:115] "Creating device plugin manager" devicePluginEnabled=true
I1222 19:29:35.965319    5056 manager.go:141] "Creating Device Plugin manager" path="C:\\var\\lib\\kubelet\\device-plugins\\kubelet.sock"
I1222 19:29:35.967824    5056 fake_topology_manager.go:54] "AddHintProvider" hintProvider=&{socketname:kubelet.sock socketdir:C:\var\lib\kubelet\device-plugins\ endpoints:map[] mutex:{state:0 sema:0} server:<nil> wg:{noCopy:{} state1:[0 0 0]} activePods:0x1f49020 sourcesReady:0x7a59e00 callback:0x1f49080 allDevices:map[] healthyDevices:map[] unhealthyDevices:map[] allocatedDevices:map[] podDevices:0xc000a20a60 checkpointManager:0xc000a461e0 numaNodes:[] topologyAffinityStore:0xc0009d6370 devicesToReuse:map[] pendingAdmissionPod:<nil>}
I1222 19:29:35.967824    5056 server.go:796] "Failed to ApplyOOMScoreAdj" err="setting OOM scores is unsupported in this build"
E1222 19:29:35.967824    5056 server.go:301] "Failed to run kubelet" err="failed to run Kubelet: using dockershim is not supported, please consider using a full-fledged CRI implementation"

For "kubelet running with insufficient permissions" err="error while checking admin group membership: error retrieving group ids: The user name could not be found." it looks like it's an error that doesn't make the kubelet fail.

For "Failed to run kubelet" err="failed to run Kubelet: using dockershim is not supported, please consider using a full-fledged CRI implementation", this is a fatal error at https://github.com/kubernetes/kubernetes/blob/c75d254beb662ea73736d2f123d0382b7f91dfbd/cmd/kubelet/app/server.go?#L799, there's more info about dockershim getting removed here https://kubernetes.io/blog/2021/11/12/are-you-ready-for-dockershim-removal/#:~:text=dockershim%20will%20be%20removed%20in%20December%20at%20the%20beginning%20of%20the%201.24%20release%20development%20cycle

@mauriciopoppe
Copy link
Member Author

Found out that we have to migrate to containerd in our prow jobs too kubernetes/test-infra#24602

@mauriciopoppe
Copy link
Member Author

the containerd issue was fixed in kubernetes/kubernetes#107189, there's a new issue when the images are built addressed in #884

@mauriciopoppe
Copy link
Member Author

@k8s-ci-robot
Copy link
Contributor

@mauriciopoppe: Closing this issue.

In response to this:

the windows tests are now running https://testgrid.k8s.io/provider-gcp-compute-persistent-disk-csi-driver#ci-win20h2-provider-gcp-compute-persistent-disk-csi-driver, thanks @amacaskill!

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mauriciopoppe
Copy link
Member Author

/reopen

I saw that recent runs are failing again https://testgrid.k8s.io/provider-gcp-compute-persistent-disk-csi-driver#ci-windows-2019-provider-gcp-compute-persistent-disk-csi-driver, the logs say:

W0914 18:56:05.398] I0914 18:56:05.397655    7399 utils.go:18] cmd args=[kubetest2 gce --run-id=bdd9df00-3454-11ed-9c39-9af7702f072d --test=ginkgo --legacy-mode --repo-root=/tmp/gcp-pd-driver-tmp3694256540/kubernetes --artifacts=/workspace/_artifacts/sc-windows -- --use-built-binaries --focus-regex=External.Storage --skip-regex=\[Disruptive\]|\[Serial\]|\[LinuxOnly\] --parallel=4 --test-args=--storage.testdriver=/go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/test/k8s-integration/config/test-config.yaml  --node-os-distro=windows --allowed-not-ready-nodes=2]
W0914 18:56:05.518] I0914 18:56:05.518408  174124 app.go:61] The files in RunDir shall not be part of Artifacts
W0914 18:56:05.519] I0914 18:56:05.518576  174124 app.go:62] pass rundir-in-artifacts flag True for RunDir to be part of Artifacts
W0914 18:56:05.519] I0914 18:56:05.518612  174124 app.go:64] RunDir for this run: "/go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/deploy/kubernetes/overlays/stable-master/_rundir/bdd9df00-3454-11ed-9c39-9af7702f072d"
W0914 18:56:05.525] I0914 18:56:05.525030  174124 app.go:128] ID for this run: "bdd9df00-3454-11ed-9c39-9af7702f072d"
W0914 18:56:05.547] I0914 18:56:05.546806  174142 ginkgo.go:141] Using kubeconfig at /root/.kube/config
W0914 18:56:05.547] F0914 18:56:05.547521  174142 ginkgo.go:260] failed to run ginkgo tester: failed to validate pre-built binary kubectl (checked at "/go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/deploy/kubernetes/overlays/stable-master/_rundir/bdd9df00-3454-11ed-9c39-9af7702f072d/kubectl"): stat /go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/deploy/kubernetes/overlays/stable-master/_rundir/bdd9df00-3454-11ed-9c39-9af7702f072d/kubectl: no such file or directory
W0914 18:56:05.552] Error: exit status 255

Comparing with a successful run of Linux (pick a run in https://testgrid.k8s.io/provider-gcp-compute-persistent-disk-csi-driver#Kubernetes%20Master%20Driver%20Latest):

W0914 16:48:50.113] I0914 16:48:50.112903    7093 utils.go:18] cmd args=[kubetest2 gce --run-id=e3b563e8-3447-11ed-9c39-9af7702f072d --test=ginkgo --legacy-mode --repo-root=/tmp/gcp-pd-driver-tmp3531201838/kubernetes --artifacts=/workspace/_artifacts/sc-balanced -- --use-built-binaries --focus-regex=External.Storage --skip-regex=\[Disruptive\]|\[Serial\] --parallel=4 --test-args=--storage.testdriver=/go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/test/k8s-integration/config/test-config.yaml ]
W0914 16:48:50.154] I0914 16:48:50.154710  104336 app.go:61] RunDir for this run: "/workspace/_artifacts/sc-balanced/e3b563e8-3447-11ed-9c39-9af7702f072d"
W0914 16:48:50.158] I0914 16:48:50.158548  104336 app.go:120] ID for this run: "e3b563e8-3447-11ed-9c39-9af7702f072d"
W0914 16:48:50.179] I0914 16:48:50.179408  104352 ginkgo.go:120] Using kubeconfig at /root/.kube/config
W0914 16:48:50.180] I0914 16:48:50.180007  104352 ginkgo.go:90] Running ginkgo test as /workspace/_artifacts/sc-balanced/e3b563e8-3447-11ed-9c39-9af7702f072d/ginkgo [--nodes=4 /workspace/_artifacts/sc-balanced/e3b563e8-3447-11ed-9c39-9af7702f072d/e2e.test -- --kubeconfig=/root/.kube/config --kubectl-path=/workspace/_artifacts/sc-balanced/e3b563e8-3447-11ed-9c39-9af7702f072d/kubectl --ginkgo.flakeAttempts=1 --ginkgo.skip=\[Disruptive\]|\[Serial\] --ginkgo.focus=External.Storage --report-dir=/workspace/_artifacts/sc-balanced/e3b563e8-3447-11ed-9c39-9af7702f072d --storage.testdriver=/go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/test/k8s-integration/config/test-config.yaml]

The tests in Windows aren't finding the kubectl binary, I think there's a problem with the path in Windows. I haven't checked these links in some time and there might have been changes in test/k8s-integration/main.go which is the test runner for both Linux and Windows.

/cc @alexander-ding

@k8s-ci-robot
Copy link
Contributor

@mauriciopoppe: Reopened this issue.

In response to this:

/reopen

I saw that recent runs are failing again https://testgrid.k8s.io/provider-gcp-compute-persistent-disk-csi-driver#ci-windows-2019-provider-gcp-compute-persistent-disk-csi-driver, the logs say:

W0914 18:56:05.398] I0914 18:56:05.397655    7399 utils.go:18] cmd args=[kubetest2 gce --run-id=bdd9df00-3454-11ed-9c39-9af7702f072d --test=ginkgo --legacy-mode --repo-root=/tmp/gcp-pd-driver-tmp3694256540/kubernetes --artifacts=/workspace/_artifacts/sc-windows -- --use-built-binaries --focus-regex=External.Storage --skip-regex=\[Disruptive\]|\[Serial\]|\[LinuxOnly\] --parallel=4 --test-args=--storage.testdriver=/go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/test/k8s-integration/config/test-config.yaml  --node-os-distro=windows --allowed-not-ready-nodes=2]
W0914 18:56:05.518] I0914 18:56:05.518408  174124 app.go:61] The files in RunDir shall not be part of Artifacts
W0914 18:56:05.519] I0914 18:56:05.518576  174124 app.go:62] pass rundir-in-artifacts flag True for RunDir to be part of Artifacts
W0914 18:56:05.519] I0914 18:56:05.518612  174124 app.go:64] RunDir for this run: "/go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/deploy/kubernetes/overlays/stable-master/_rundir/bdd9df00-3454-11ed-9c39-9af7702f072d"
W0914 18:56:05.525] I0914 18:56:05.525030  174124 app.go:128] ID for this run: "bdd9df00-3454-11ed-9c39-9af7702f072d"
W0914 18:56:05.547] I0914 18:56:05.546806  174142 ginkgo.go:141] Using kubeconfig at /root/.kube/config
W0914 18:56:05.547] F0914 18:56:05.547521  174142 ginkgo.go:260] failed to run ginkgo tester: failed to validate pre-built binary kubectl (checked at "/go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/deploy/kubernetes/overlays/stable-master/_rundir/bdd9df00-3454-11ed-9c39-9af7702f072d/kubectl"): stat /go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/deploy/kubernetes/overlays/stable-master/_rundir/bdd9df00-3454-11ed-9c39-9af7702f072d/kubectl: no such file or directory
W0914 18:56:05.552] Error: exit status 255

Comparing with a successful run of Linux (pick a run in https://testgrid.k8s.io/provider-gcp-compute-persistent-disk-csi-driver#Kubernetes%20Master%20Driver%20Latest):

W0914 16:48:50.113] I0914 16:48:50.112903    7093 utils.go:18] cmd args=[kubetest2 gce --run-id=e3b563e8-3447-11ed-9c39-9af7702f072d --test=ginkgo --legacy-mode --repo-root=/tmp/gcp-pd-driver-tmp3531201838/kubernetes --artifacts=/workspace/_artifacts/sc-balanced -- --use-built-binaries --focus-regex=External.Storage --skip-regex=\[Disruptive\]|\[Serial\] --parallel=4 --test-args=--storage.testdriver=/go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/test/k8s-integration/config/test-config.yaml ]
W0914 16:48:50.154] I0914 16:48:50.154710  104336 app.go:61] RunDir for this run: "/workspace/_artifacts/sc-balanced/e3b563e8-3447-11ed-9c39-9af7702f072d"
W0914 16:48:50.158] I0914 16:48:50.158548  104336 app.go:120] ID for this run: "e3b563e8-3447-11ed-9c39-9af7702f072d"
W0914 16:48:50.179] I0914 16:48:50.179408  104352 ginkgo.go:120] Using kubeconfig at /root/.kube/config
W0914 16:48:50.180] I0914 16:48:50.180007  104352 ginkgo.go:90] Running ginkgo test as /workspace/_artifacts/sc-balanced/e3b563e8-3447-11ed-9c39-9af7702f072d/ginkgo [--nodes=4 /workspace/_artifacts/sc-balanced/e3b563e8-3447-11ed-9c39-9af7702f072d/e2e.test -- --kubeconfig=/root/.kube/config --kubectl-path=/workspace/_artifacts/sc-balanced/e3b563e8-3447-11ed-9c39-9af7702f072d/kubectl --ginkgo.flakeAttempts=1 --ginkgo.skip=\[Disruptive\]|\[Serial\] --ginkgo.focus=External.Storage --report-dir=/workspace/_artifacts/sc-balanced/e3b563e8-3447-11ed-9c39-9af7702f072d --storage.testdriver=/go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/test/k8s-integration/config/test-config.yaml]

The tests in Windows aren't finding the kubectl binary, I think there's a problem with the path in Windows. I haven't checked these links in some time and there might have been changes in test/k8s-integration/main.go which is the test runner for both Linux and Windows.

/cc @alexander-ding

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Sep 14, 2022
@mauriciopoppe
Copy link
Member Author

Reading the logs in detail I saw that a different log message in the same line!

Windows:

W0914 18:56:05.518] I0914 18:56:05.518408  174124 app.go:61] The files in RunDir shall not be part of Artifacts

Linux:

W0914 16:48:50.154] I0914 16:48:50.154710  104336 app.go:61] RunDir for this run: "/workspace/_artifacts/sc-balanced/e3b563e8-3447-11ed-9c39-9af7702f072d"

It led me to believe that a different version of kubetest2 is used in Windows/Linux, I checked the top of the logs where the script runs and I see this line set only in Linux:

kt2_version=0e09086b60c122e1084edd2368d3d27fe36f384f

@mauriciopoppe
Copy link
Member Author

Fixed in #1046

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants