Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-61979

MCP degraded due to failed node annotation update triggers render reapplication

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Master MCP degraded due to: failed to set annotations on node: unable to update node during HTTP/2 connection loss to API server. This triggers full configuration reconciliation despite no actual MCP config changes, causing pod restarts.

      Version-Release number of selected component (if applicable):
      OCP 4.16.37 / Baremetal 3 master + 2 workers

      Additional info:

      The master MCP has the render configuration from 2025-06-24, and without applying any changes to the render, the configuration is reconciled again in 2025-09-02 due to an apparent API problem. We need to understand if this behavior is expected.

      omc get mcp                                                                                                                                                                                                              
      NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      master   rendered-master-015dd60bc71e16c60b395ee004216e4c   True      False      False      3              3                   3                     0                      85d
      worker   rendered-worker-b4ad4ae002ac30931206364429246797   True      False      False      2              2                   2                     0                      85d
       
      omc get mc -o json | jq -r '                                                                                                                                                                                   
        .items[]
        | select(.metadata.name|test("^rendered-master-"))
        | [.metadata.name, .metadata.creationTimestamp] | @tsv' | sort -k2
      rendered-master-c19e70676f3c17344ba77ed244fc8793	2025-06-24T13:49:46Z
      rendered-master-1001201a675b96d1cb226329c79317fa	2025-06-24T14:12:11Z
      rendered-master-015dd60bc71e16c60b395ee004216e4c	2025-06-24T14:12:42Z
       
      omc get nodes master1 -o yaml | grep desiredConfig
          machineconfiguration.openshift.io/desiredConfig: rendered-master-015dd60bc71e16c60b395ee004216e4c
       
      omc get mcp master -o jsonpath='{range .status.conditions[*]}{.type}{"\t"}{.status}{"\t"}{.lastTransitionTime}{"\n"}{end}'                                                                                     
      RenderDegraded	False	2025-06-24T13:49:47Z
      Updated	         True	2025-09-02T18:00:05Z
      Updating		False	2025-09-02T18:00:05Z
      NodeDegraded	False	2025-09-02T18:00:05Z
      Degraded		False	2025-09-02T18:00:05Z
      
      
      omc logs machine-config-daemon-vfppr -n openshift-machine-config-operator -c machine-config-daemon
      ...
      2025-09-02T17:55:49.145320345Z I0902 17:55:49.145269   11038 certificate_writer.go:303] Certificate was synced from controllerconfig resourceVersion 46102033
      2025-09-02T17:58:34.378523828Z W0902 17:58:34.378487   11038 reflector.go:462] github.com/openshift/client-go/config/informers/externalversions/factory.go:125: watch of *v1.FeatureGate ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
      2025-09-02T17:58:34.378523828Z W0902 17:58:34.378487   11038 reflector.go:462] github.com/openshift/client-go/config/informers/externalversions/factory.go:125: watch of *v1.ClusterVersion ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
      2025-09-02T17:58:34.378553754Z W0902 17:58:34.378492   11038 reflector.go:462] k8s.io/client-go/informers/factory.go:159: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
      2025-09-02T17:58:34.378553754Z W0902 17:58:34.378523   11038 reflector.go:462] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: watch of *v1.MachineConfig ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
      2025-09-02T17:58:34.378553754Z W0902 17:58:34.378492   11038 reflector.go:462] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125: watch of *v1.ControllerConfig ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
      2025-09-02T17:58:36.256191225Z W0902 17:58:36.256157   11038 reflector.go:462] k8s.io/client-go/informers/factory.go:159: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
      2025-09-02T17:58:36.256310943Z E0902 17:58:36.256295   11038 writer.go:226] Marking Degraded due to: failed to set annotations on node: unable to update node "&Node{ObjectMeta:{      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},Spec:NodeSpec{PodCIDR:,DoNotUseExternalID:,ProviderID:,Unschedulable:false,Taints:[]Taint{},ConfigSource:nil,PodCIDRs:[],},Status:NodeStatus{Capacity:ResourceList{},Allocatable:ResourceList{},Phase:,Conditions:[]NodeCondition{},Addresses:[]NodeAddress{},DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:0,},},NodeInfo:NodeSystemInfo{MachineID:,SystemUUID:,BootID:,KernelVersion:,OSImage:,ContainerRuntimeVersion:,KubeletVersion:,KubeProxyVersion:,OperatingSystem:,Architecture:,},Images:[]ContainerImage{},VolumesInUse:[],VolumesAttached:[]AttachedVolume{},Config:nil,},}": Patch "https://api-int.demai.cloudran.telefonica.net:6443/api/v1/nodes/master1": http2: client connection lost
      2025-09-02T17:58:36.291669375Z I0902 17:58:36.291625   11038 certificate_writer.go:303] Certificate was synced from controllerconfig resourceVersion 46102033
      2025-09-02T17:58:36.837510235Z I0902 17:58:36.837473   11038 daemon.go:739] Transitioned from state: Done -> Degraded
      2025-09-02T17:58:36.837510235Z I0902 17:58:36.837493   11038 daemon.go:742] Transitioned from degraded/unreconcilable reason  -> failed to set annotations on node: unable to update node "&Node{ObjectMeta:{      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},Spec:NodeSpec{PodCIDR:,DoNotUseExternalID:,ProviderID:,Unschedulable:false,Taints:[]Taint{},ConfigSource:nil,PodCIDRs:[],},Status:NodeStatus{Capacity:ResourceList{},Allocatable:ResourceList{},Phase:,Conditions:[]NodeCondition{},Addresses:[]NodeAddress{},DaemonEndpoints:NodeDaemonEndpoints{KubeletEndpoint:DaemonEndpoint{Port:0,},},NodeInfo:NodeSystemInfo{MachineID:,SystemUUID:,BootID:,KernelVersion:,OSImage:,ContainerRuntimeVersion:,KubeletVersion:,KubeProxyVersion:,OperatingSystem:,Architecture:,},Images:[]ContainerImage{},VolumesInUse:[],VolumesAttached:[]AttachedVolume{},Config:nil,},}": Patch "https://api-int.demai.cloudran.telefonica.net:6443/api/v1/nodes/master1": http2: client connection lost
      2025-09-02T17:58:36.842880863Z W0902 17:58:36.842849   11038 daemon.go:2383] current+desiredConfig is rendered-master-015dd60bc71e16c60b395ee004216e4c but state is Degraded
      2025-09-02T17:58:36.905278530Z I0902 17:58:36.905253   11038 rpm-ostree.go:308] Running captured: rpm-ostree kargs
      2025-09-02T17:58:37.169799357Z I0902 17:58:37.169764   11038 daemon.go:935] Preflight config drift check successful (took 323.184747ms)
      2025-09-02T17:58:37.174339580Z I0902 17:58:37.174312   11038 config_drift_monitor.go:255] Config Drift Monitor has shut down
      2025-09-02T17:58:37.174339580Z I0902 17:58:37.174329   11038 update.go:2631] Adding SIGTERM protection
      2025-09-02T17:58:37.221332855Z I0902 17:58:37.221303   11038 update.go:1019] Checking Reconcilable for config rendered-master-015dd60bc71e16c60b395ee004216e4c to rendered-master-015dd60bc71e16c60b395ee004216e4c
      2025-09-02T17:58:37.322629406Z I0902 17:58:37.322598   11038 update.go:2609] Starting update from rendered-master-015dd60bc71e16c60b395ee004216e4c to rendered-master-015dd60bc71e16c60b395ee004216e4c: &{osUpdate:false kargs:false fips:false passwd:false files:false units:false kernelType:false 

              team-mco Team MCO
              rhn-support-jclaretm Jorge Claret Membrado
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: