Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44706

[sig-node] node-lifecycle detects unexpected not ready node firing on metal upgrade jobs

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None

      Looking at the job failures for metal upgrades, I see that Unexpected Node Not Ready is firing and it seems to be a legit issue.

      Drilling into the first job:

      Node reports unknown status which translates to NotReady for master-1 at 20:00.

      node/master-1 - reason/UnexpectedNotReady unexpected node not ready at from: 2024-11-18 20:00:14.466269295 +0000 UTC m=+743.463477268 - to: 2024-11-18 20:00:14.466269295 +0000 UTC m=+743.463477268}

      Looking at the node at this point in time I can see that kubelet sets the node as unknown status due to not being able to check the heartbeat on the node. Reading the journal logs at master-1 around this time I am seeing lease failure errors which seems to lead to the status being reported not ready.

      Nov 18 20:00:10.089937 master-1 kubenswrapper[2948]: E1118 20:00:10.089826    2948 controller.go:195] "Failed to update lease" err="Put \"https://api-int.ostest.test.metalkube.org:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/master-1?timeout=10s\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
      Nov 18 20:00:11.214172 master-1 ovs-vswitchd[1180]: ovs|00966|connmgr|INFO|br-ex<->unix#2869: 2 flow_mods in the last 0 s (1 adds, 1 deletes)
      Nov 18 20:00:13.038711 master-1 kubenswrapper[2948]: E1118 20:00:13.038660    2948 kubelet_node_status.go:594] "Error updating node status, will retry" err="error getting node \"master-
      ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
      Nov 18 20:00:17.704218 master-1 kubenswrapper[2948]: W1118 20:00:17.704174    2948 reflector.go:470] object-"openshift-apiserver"/"image-import-ca": watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
      Nov 18 20:00:17.704251 master-1 kubenswrapper[2948]: W1118 20:00:17.704197    2948 reflector.go:470] object-"openshift-machine-api"/"openshift-service-ca.crt": watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
      Nov 18 20:00:17.704251 master-1 kubenswrapper[2948]: W1118 20:00:17.704205    2948 reflector.go:470] object-"openshift-authentication"/"v4-0-config-system-service-ca": watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
      Nov 18 20:00:17.704251 master-1 kubenswrapper[2948]: W1118 20:00:17.704241    2948 reflector.go:470] object-"openshift-machine-api"/"machine-api-operator-tls": watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
      @ 

              eterrell@redhat.com Eduardo Otubo
              rh-ee-kehannon Kevin Hannon
              Jad Haj Yahya Jad Haj Yahya
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: