-
Bug
-
Resolution: Done
-
Undefined
-
None
-
4.18
-
None
-
None
-
False
-
Looking at the job failures for metal upgrades, I see that Unexpected Node Not Ready is firing and it seems to be a legit issue.
- https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-upgrade-from-stable-4.17-e2e-metal-ipi-ovn-upgrade/1858573202918215680
- https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-upgrade-from-stable-4.17-e2e-metal-ipi-ovn-upgrade/1858165066398961664
Drilling into the first job:
Node reports unknown status which translates to NotReady for master-1 at 20:00.
node/master-1 - reason/UnexpectedNotReady unexpected node not ready at from: 2024-11-18 20:00:14.466269295 +0000 UTC m=+743.463477268 - to: 2024-11-18 20:00:14.466269295 +0000 UTC m=+743.463477268}
Looking at the node at this point in time I can see that kubelet sets the node as unknown status due to not being able to check the heartbeat on the node. Reading the journal logs at master-1 around this time I am seeing lease failure errors which seems to lead to the status being reported not ready.
Nov 18 20:00:10.089937 master-1 kubenswrapper[2948]: E1118 20:00:10.089826 2948 controller.go:195] "Failed to update lease" err="Put \"https://api-int.ostest.test.metalkube.org:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/master-1?timeout=10s\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)" Nov 18 20:00:11.214172 master-1 ovs-vswitchd[1180]: ovs|00966|connmgr|INFO|br-ex<->unix#2869: 2 flow_mods in the last 0 s (1 adds, 1 deletes) Nov 18 20:00:13.038711 master-1 kubenswrapper[2948]: E1118 20:00:13.038660 2948 kubelet_node_status.go:594] "Error updating node status, will retry" err="error getting node \"master- ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding Nov 18 20:00:17.704218 master-1 kubenswrapper[2948]: W1118 20:00:17.704174 2948 reflector.go:470] object-"openshift-apiserver"/"image-import-ca": watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding Nov 18 20:00:17.704251 master-1 kubenswrapper[2948]: W1118 20:00:17.704197 2948 reflector.go:470] object-"openshift-machine-api"/"openshift-service-ca.crt": watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding Nov 18 20:00:17.704251 master-1 kubenswrapper[2948]: W1118 20:00:17.704205 2948 reflector.go:470] object-"openshift-authentication"/"v4-0-config-system-service-ca": watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding Nov 18 20:00:17.704251 master-1 kubenswrapper[2948]: W1118 20:00:17.704241 2948 reflector.go:470] object-"openshift-machine-api"/"machine-api-operator-tls": watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding @