-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
False
-
None
-
False
-
-
It has been observed that master 0 was unready for 4s in this job during conformance test: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn-upgrade/1674247083348987904
Slack thread for the context: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1688049633796099
The initial aggregated failure is about "clusteroperator/control-plane-machine-set should not change condition/Available": https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-azure-ovn-upgrade-4.14-micro-release-openshift-release-analysis-aggregator/1674247086704431104
But the clusteroperator was changing state due to master0 state change. The master node state change at that time:
conditions:
- lastHeartbeatTime: "2023-06-29T05:11:28Z"
- lastTransitionTime: "2023-06-29T04:35:36Z"
- message: kubelet has sufficient memory available
- reason: KubeletHasSufficientMemory
- status: "False"
+ lastTransitionTime: "2023-06-29T05:16:49Z"
+ message: Kubelet stopped posting node status.
+ reason: NodeStatusUnknown
+ status: Unknown
type: MemoryPressure
- lastHeartbeatTime: "2023-06-29T05:11:28Z"
- lastTransitionTime: "2023-06-29T04:35:36Z"
- message: kubelet has no disk pressure
- reason: KubeletHasNoDiskPressure
- status: "False"
+ lastTransitionTime: "2023-06-29T05:16:49Z"
+ message: Kubelet stopped posting node status.
+ reason: NodeStatusUnknown
+ status: Unknown
type: DiskPressure
- lastHeartbeatTime: "2023-06-29T05:11:28Z"
- lastTransitionTime: "2023-06-29T04:35:36Z"
- message: kubelet has sufficient PID available
- reason: KubeletHasSufficientPID
- status: "False"
+ lastTransitionTime: "2023-06-29T05:16:49Z"
+ message: Kubelet stopped posting node status.
+ reason: NodeStatusUnknown
+ status: Unknown
type: PIDPressure
- lastHeartbeatTime: "2023-06-29T05:11:28Z"
- lastTransitionTime: "2023-06-29T04:35:36Z"
- message: kubelet is posting ready status
- reason: KubeletReady
- status: "True"
+ lastTransitionTime: "2023-06-29T05:16:49Z"
+ message: Kubelet stopped posting node status.
+ reason: NodeStatusUnknown
+ status: Unknown
type: Ready
Kubelet is complaining about api access around that time:
Jun 29 05:16:24.250797 ci-op-cyqgzj4w-ed5cd-ll5md-master-0 kubenswrapper[2336]: E0629 05:16:24.250754 2336 kubelet_node_status.go:567] "Error updating node status, will retry" err="error getting node \"ci-op-cyqgzj4w-ed5cd-ll5md-master-0\": Get \"https://api-int.ci-op-cyqgzj4w-ed5cd.ci2.azure.devcluster.openshift.com:6443/api/v1/nodes/ci-op-cyqgzj4w-ed5cd-ll5md-master-0?resourceVersion=0&timeout=10s
\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"