Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59958

Autosizing causes control plane node to reboot twice during upgrade

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • OCP Node Sprint 275 (blue)
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Cluster is configured with a kubelet config which enables autoSizingReserved for control plane nodes:
      apiVersion: machineconfiguration.openshift.io/v1
      kind: KubeletConfig
      metadata:
        name: autosizing-master
      spec:
        autoSizingReserved: true
        machineConfigPoolSelector:
          matchLabels:
            pools.operator.machineconfiguration.openshift.io/master: ""
      
      During upgrade (z-stream 4.18.12 to 4.18.13 in this case) the first control plane to upgrade via MCO experiences two reboots adding ~10 minutes to upgrade procedure (doubled for ESU-EUS). Other control plane nodes go through a single reboot as expected.
      
      Monitoring the MCO "desiredConfig" shows that the first control plane node receives an updated rendered MachineConfig after the first reboot. Comparing the two rendered MachineConfigs shows a change in only the node sizing portion of the config:
      $ diff first second
      7c7
      <   creationTimestamp: "2025-07-29T21:54:21Z"
      ---
      >   creationTimestamp: "2025-07-29T21:54:29Z"
      9c9
      <   name: rendered-master-f5ec106f126d02ae2b6567bf5574b01c
      ---
      >   name: rendered-master-64caa74099098c0f20b62e075cfbe2b7
      17,18c17,18
      <   resourceVersion: "9707963"
      <   uid: 2a7571e2-d94e-4e27-96a5-7cdc02950f9c
      ---
      >   resourceVersion: "9708226"
      >   uid: 60a06388-9b34-4ff3-a6d6-eb665efb6367
      190c190,191
      <           source: data:,NODE_SIZING_ENABLED%3Dfalse%0ASYSTEM_RESERVED_MEMORY%3D1Gi%0ASYSTEM_RESERVED_CPU%3D500m%0ASYSTEM_RESERVED_ES%3D1Gi
      ---
      >           compression: ""
      >           source: data:text/plain;charset=utf-8;base64,Tk9ERV9TSVpJTkdfRU5BQkxFRD10cnVlClNZU1RFTV9SRVNFUlZFRF9NRU1PUlk9MUdpClNZU1RFTV9SRVNFUlZFRF9DUFU9NTAwbQpTWVNURU1fUkVTRVJWRURfRVM9MUdpCg==
      $ echo Tk9ERV9TSVpJTkdfRU5BQkxFRD10cnVlClNZU1RFTV9SRVNFUlZFRF9NRU1PUlk9MUdpClNZU1RFTV9SRVNFUlZFRF9DUFU9NTAwbQpTWVNURU1fUkVTRVJWRURfRVM9MUdpCg== | base64 -d
      NODE_SIZING_ENABLED=true
      SYSTEM_RESERVED_MEMORY=1Gi
      SYSTEM_RESERVED_CPU=500m
      SYSTEM_RESERVED_ES=1Gi
      
      Boot logs from control plane nodes showing double reboot of cnfdf02-control-1 (first to upgrade) and single for the other two:
      $ snode cnfdf02-control-1 journalctl --list-boots
      Node: cnfdf02-control-1
      IDX BOOT ID                          FIRST ENTRY                 LAST ENTRY
       -8 114b0f8770a244628ce92704386b8970 Mon 2025-07-21 18:35:40 UTC Mon 2025-07-21 20:03:27 UTC
       -7 2725cf1755cb43afafad9ffce3048d4b Mon 2025-07-21 20:05:04 UTC Tue 2025-07-22 12:58:57 UTC
       -6 0396c666654642a3811099c8bc558db8 Tue 2025-07-22 13:00:30 UTC Tue 2025-07-22 13:06:01 UTC
       -5 ded5c2c80d9c4419a3ce7805b5d3d61c Tue 2025-07-22 13:07:39 UTC Tue 2025-07-22 14:47:53 UTC
       -4 10c8187d76a7498787091dc4734c96d3 Tue 2025-07-22 14:49:27 UTC Wed 2025-07-23 13:42:55 UTC
       -3 73873a037be541789496e0762e83817e Wed 2025-07-23 13:44:30 UTC Wed 2025-07-23 13:51:31 UTC
       -2 8d8bebf6d66044328f8d77c05e75f4b9 Wed 2025-07-23 13:53:05 UTC Tue 2025-07-29 21:59:51 UTC
       -1 9c9864695c7d481ab2e8e2cbc82fcb49 Tue 2025-07-29 22:01:26 UTC Tue 2025-07-29 22:08:38 UTC
        0 87b7a6ea2f434d5baf65b93283d78f9e Tue 2025-07-29 22:10:13 UTC Wed 2025-07-30 12:32:39 UTC
      $ snode cnfdf02-control-0 journalctl --list-boots
      Node: cnfdf02-control-0
      IDX BOOT ID                          FIRST ENTRY                 LAST ENTRY
       -5 6b17f9558b85469488022dd7adb10eec Mon 2025-07-21 19:16:30 UTC Mon 2025-07-21 19:52:10 UTC
       -4 525ee7526d91448795490de07d275af1 Mon 2025-07-21 19:54:48 UTC Tue 2025-07-22 13:22:19 UTC
       -3 43215a18dbaa410681e08c134ba23043 Tue 2025-07-22 13:24:57 UTC Tue 2025-07-22 14:30:56 UTC
       -2 6bba59fbf69140129dbb89e66d40cb09 Tue 2025-07-22 14:33:35 UTC Wed 2025-07-23 14:06:21 UTC
       -1 a06ed474ae294f9d92db35cf57defbda Wed 2025-07-23 14:08:59 UTC Tue 2025-07-29 22:24:56 UTC
        0 42ffd3f797684c508a75e62c14313983 Tue 2025-07-29 22:27:34 UTC Wed 2025-07-30 12:32:43 UTC
      $ snode cnfdf02-control-2 journalctl --list-boots
      Node: cnfdf02-control-2
      IDX BOOT ID                          FIRST ENTRY                 LAST ENTRY
       -5 5682944cf46f479a9c4724cf35b36449 Mon 2025-07-21 19:24:19 UTC Mon 2025-07-21 19:59:43 UTC
       -4 b26b38e69898492a8e338a43f83dd57b Mon 2025-07-21 19:59:54 UTC Tue 2025-07-22 13:15:06 UTC
       -3 4610d74d91e7470583f28b3af22b5996 Tue 2025-07-22 13:15:18 UTC Tue 2025-07-22 14:41:07 UTC
       -2 7fa59a763a844f8d9259605867718de3 Tue 2025-07-22 14:41:19 UTC Wed 2025-07-23 13:59:47 UTC
       -1 590514258f1d493d8d7c9b91c669665a Wed 2025-07-23 13:59:58 UTC Tue 2025-07-29 22:18:27 UTC
        0 fea5c2182d514176913a73c998053454 Tue 2025-07-29 22:18:38 UTC Wed 2025-07-30 12:32:47 UTC
      

      Version-Release number of selected component (if applicable):

      4.18.13

      How reproducible:

      I have observed the double reboots of one control plane node consistently but only dug in to see why this time.

      Steps to Reproduce:

          1. Cluster configured with KubeletConfig as above, running 4.18.12
          2. Update to 4.18.13 (set desired version in ClusterVersion CR)
          3. Observe control plane node status during MCO upgrade (final phase)
          

      Actual results:

          Double reboot of first control plane node

      Expected results:

          All control plane nodes reboot only once during upgrade

      Additional info:

          

       

              team-mco Team MCO
              rhn-support-imiller Ian Miller
              None
              None
              Min Li Min Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: