-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
None
-
None
-
False
-
None
-
False
-
---
-
-
-
SDN Sprint 230
-
0
-
0
The following error is happening frequently in vSphere CI jobs. Reference to a recent failed job: 1591151696636022784
error: timed out waiting for the condition on clusteroperators/network {"component":"entrypoint","error":"wrapped process failed: exit status 1","file":"k8s.io/test-infra/prow/entrypoint/run.go:79","func":"k8s.io/test-infra/prow/entrypoint.Options.Run","level":"error","msg":"Error executing test process","severity":"error","time":"2022-11-11T21:57:34Z"} error: failed to execute wrapped command: exit status 1 INFO[2022-11-11T21:57:36Z] Step vsphere-e2e-operator-ipi-install-vsphere-registry failed after 39m0s. INFO[2022-11-11T21:57:36Z] Step phase pre failed after 1h29m10s.
After an initial triage in a Slack thread with dcbw@redhat.com seems like a network-check daemonset issue, where there are one/more network check target pods unavailable.
"status": { "currentNumberScheduled": 6, "desiredNumberScheduled": 6, "numberAvailable": 5, "numberMisscheduled": 0, "numberReady": 5, "numberUnavailable": 1, "observedGeneration": 1, "updatedNumberScheduled": 6 }
The node journal on the machine that is supposed to have that pod shows the following log:
Nov 01 16:45:26.044455 ci-op-h4g42fh5-23f97-4zn6r-master-0 kubenswrapper[2991]: I1101 16:45:26.044256 2991 prober.go:121] "Probe failed" probeType="Readiness" pod="openshift-network-diagnostics/network-check-target-rdrpd" podUID=c0b15643-55e8-450e-bb70-ad76391cd012 containerName="network-check-target-container" probeResult=failure output="Get \"http://10.128.0.3:8080/\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"