-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
SDN Sprint 230
-
None
-
None
-
None
The following error is happening frequently in vSphere CI jobs. Reference to a recent failed job: 1591151696636022784
error: timed out waiting for the condition on clusteroperators/network
{"component":"entrypoint","error":"wrapped process failed: exit status 1","file":"k8s.io/test-infra/prow/entrypoint/run.go:79","func":"k8s.io/test-infra/prow/entrypoint.Options.Run","level":"error","msg":"Error executing test process","severity":"error","time":"2022-11-11T21:57:34Z"}
error: failed to execute wrapped command: exit status 1
INFO[2022-11-11T21:57:36Z] Step vsphere-e2e-operator-ipi-install-vsphere-registry failed after 39m0s.
INFO[2022-11-11T21:57:36Z] Step phase pre failed after 1h29m10s.
After an initial triage in a Slack thread with dcbw@redhat.com seems like a network-check daemonset issue, where there are one/more network check target pods unavailable.
"status": {
"currentNumberScheduled": 6,
"desiredNumberScheduled": 6,
"numberAvailable": 5,
"numberMisscheduled": 0,
"numberReady": 5,
"numberUnavailable": 1,
"observedGeneration": 1,
"updatedNumberScheduled": 6
}
The node journal on the machine that is supposed to have that pod shows the following log:
Nov 01 16:45:26.044455 ci-op-h4g42fh5-23f97-4zn6r-master-0 kubenswrapper[2991]: I1101 16:45:26.044256 2991 prober.go:121] "Probe failed" probeType="Readiness" pod="openshift-network-diagnostics/network-check-target-rdrpd" podUID=c0b15643-55e8-450e-bb70-ad76391cd012 containerName="network-check-target-container" probeResult=failure output="Get \"http://10.128.0.3:8080/\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"