-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
4.16.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
This is a clone of issue OCPBUGS-48116. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-47802. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-46460. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-42636. The following is the description of the original issue:
—
Description of problem:
During the upgrade of an OpenShift cluster from 4.16.37 to 4.16.38 on baremetal, we have observed that master nodes are being rebooted multiple times (3 times instead of the expected single reboot). This issue matches exactly the behavior described in OCPBUGS-48116, which was supposedly fixed in 4.16.32 via RHSA-2025:0650. The issue occurs in environments with custom container runtime configurations where a second MachineConfig render is unnecessarily generated during the upgrade process.
How reproducible:
Perform the upgrade of a Multi Node OCP with a custom configuration like a performance profile or container runtime configuration (like force cgroups v1, or update runc to crun)
Steps to Reproduce:
1. Deploy an OCP 4.16.37 cluster on baremetal with ContainerRuntimeConfig that specifies crun as the default runtime
2. Upgrade the cluster to 4.16.38
3. Monitor the upgrade process (cluster operators, Machine Configs, Machine Config Pools and nodes)
Actual results:
After most of the Cluster Operators are updated to 4.16.38 (except the Machine Config Operator), the following was observed:
1. A rendered machine config (e.g., rendered-master-21[]eee) is generated for the master MCP
2. The first master node begins rebooting
3. While that node is rebooting, another rendered machine config (e.g., rendered-master-6b[]af7) is generated containing an unnecessary container runtime configuration
4. The first master node then has to reboot a second time to apply this new config
5. This causes significant extension of application downtime and increased resource usage which causes financial impact for the customer.
Expected results:
What is expected is that in a upgrade only one Machine Config Render is generated per Machine Config Pool, and only one reboot per node to finish the upgrade.
Additional info:
This issue was supposedly fixed in 4.16.32 (OCPBUGS-48116, RHSA-2025:0650) but is still occurring in 4.16.38.
The diff between the two rendered configs shows the unnecessary container runtime configuration being added:
```
$ diff 04129367/0020-rendered-master-6b[]af7.yaml 04129367/0060-rendered-master-21[]eee.yaml
7c7
< creationTimestamp: "2025-04-23T08:35:09Z"
—
> creationTimestamp: "2025-04-23T08:15:08Z"
9c9
< name: rendered-master-6b[]af7
—
> name: rendered-master-21[]eee
17,18c17,18
< resourceVersion: "1344163"
< uid: 2b4603cb-001e-4353-98ca-81988ecd1a99
—
> resourceVersion: "1338139"
> uid: 43117626-92a8-4204-9fbd-3129e5815ebb
407,412d406
< - contents:
< compression: ""
< source: data:text/plain;charset=utf-8;base64,W2NyaW9dCiAgW2NyaW8ucnVudGltZV0KICAgIGRlZmF1bHRfcnVudGltZSA9ICJjcnVuIgo=
< mode: 420
< overwrite: true
< path: /etc/crio/crio.conf.d/01-ctrcfg-defaultRuntime
```
When decoded, it is
```
[crio]
[crio.runtime]
default_runtime = "crun"
```
- impacts account
-
OCPBUGS-42636 Multiple reboots during EUS upgrade on Control Plane nodes
-
- Closed
-
-
OCPBUGS-48116 Multiple reboots during EUS upgrade on Control Plane nodes
-
- Closed
-