Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-76295

During OCP upgrade, MCO Controller generates two rendered master MachineConfigs, causing master nodes to reboot twice

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.17.z
    • Node / Kubelet
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      During an OpenShift Container Platform (OCP) upgrade from 4.16.23 to 4.17.45, the Master MachineConfigPool (MCP) unexpectedly generated two different rendered master MachineConfigs, which resulted in master nodes rebooting twice during the upgrade.
      
      
      This behavior does not align with the expected upgrade behavior, where each node should only require a single reboot per version upgrade.    

      How reproducible:

      OCP 4.16.23 upgrade to 4.17.45

      Actual results:

       
      During the Master MCP upgrade to 4.17.45, OpenShift generated two rendered master MachineConfigs in sequence:
      1. First rendered MachineConfig:

      rendered-master-6766f48d4d2308abebee1ab4b945ad7f.yml

      2. Later rendered MachineConfig:

      rendered-master-b5b49f29de9450e2fd545072ad08239c.yml

      The only difference between these two rendered MachineConfigs is the presence of kernelArguments.

      First rendered MC

      kernelArguments: []

      Second rendered MC

      
      kernelArguments: 
        - systemd.unified_cgroup_hierarchy=0 
        - systemd.legacy_systemd_cgroup_controller=1

      Because these two rendered MachineConfigs were applied sequentially, the master nodes rebooted twice.
      Before the upgrade, on OCP 4.16.23, the Master MachineConfig already in use was:

      rendered-master-e2ccc61a2e8fb441754c532aae99caac

      This MachineConfig already contained the same kernel arguments:

       

      kernelArguments: 
        - systemd.unified_cgroup_hierarchy=0 
        - systemd.legacy_systemd_cgroup_controller=1

       

      In other words:
      • Prior to the upgrade, the master nodes were already configured with these kernel arguments.
      • During the upgrade to 4.17.45, OpenShift unexpectedly generated an intermediate rendered MachineConfig without any kernelArguments.
      • A subsequent rendered MachineConfig then reintroduced the same kernel arguments, even though they were already present before the upgrade.
      This resulted in an unnecessary additional MachineConfig change and node reboot.
       

      Expected results:

      During an upgrade from 4.16.23 to 4.17.45:

       

      • The Master MCP should generate only one rendered MachineConfig, assuming no effective configuration changes are required.
      • Master nodes should reboot only once during the upgrade.
      • Existing kernelArguments that are already present before the upgrade should not be temporarily removed and re-applied.

              rh-ee-ngopalak Neeraj Krishna Gopalakrishna
              rhn-support-jaliang Jace Liang
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: