-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.19
-
Quality / Stability / Reliability
-
False
-
-
5
-
Moderate
-
No
-
None
-
Rejected
-
CoreOS East - 272, CoreOS East - 273, CoreOS East - 274, CoreOS East - 275
-
4
-
Done
-
Bug Fix
-
Previously nodes which were originally created using OpenShift 4.1 or 4.2 boot images would fail to boot upon upgrading to 4.19. This includes new nodes scaled up with 4.1 or 4.2 boot images. With this fix those nodes will now boot properly.
-
None
-
None
-
None
-
None
The original problem described here was that 4.1/4.2 bootimages will no longer work with composefs since we didn't have static GRUB configs back then and so did grub2-mkconfig and thus grub2-prob (which breaks on composefs: https://github.com/ostreedev/ostree/issues/3198#issuecomment-2828935716). We can require bootimage updates for this. But the point remains that there are fully updated nodes out there still using grub2-mkconfig that will barf when they upgrade to 4.19. See https://issues.redhat.com/browse/OCPBUGS-52485?focusedId=27145454&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-27145454.
Original bug description follows.
—
Description of problem:
When we try to scale a node using a 4.1 boot image in 4.19, the node is added to the cluster but the pool is degraded showing this error - lastTransitionTime: "2025-03-06T12:59:31Z" message: 'Node ip-10-0-16-142.ec2.internal is reporting: "failed to remove rollback: error running rpm-ostree cleanup -r: error: cleanup: GDBus.Error:org.projectatomic.rpmostreed.Error.Failed: Bootloader write config: grub2-mkconfig: Child process exited with code 1\n: exit status 1"' reason: 1 nodes are reporting degraded status on sync status: "True" type: NodeDegraded - lastTransitionTime: "2025-03-06T12:59:31Z" message: 'Node ip-10-0-16-142.ec2.internal is reporting: "failed to remove rollback: error running rpm-ostree cleanup -r: error: cleanup: GDBus.Error:org.projectatomic.rpmostreed.Error.Failed: Bootloader write config: grub2-mkconfig: Child process exited with code 1\n: exit status 1"' reason: "" status: "True" type: Degraded
Version-Release number of selected component (if applicable):
IPI on AWS version 4.19.0-0.nightly-2025-03-05-160850
How reproducible:
Always
Steps to Reproduce:
1. Create a machineset using a 4.1 boot image 2. Scale up the machineset to create a new node If more details are needed, we can have a look at this test case: https://polarion.engineering.redhat.com/polarion/redirect/project/OSE/workitem?id=OCP-63894
Actual results:
A new node is created, the node can join to the cluster but the MCP is degraded reporting the error mentioned above.
Expected results:
No degradation should happen
Additional info:
We were not able to reproduce it using 4.3 boot images, but we could reproduce it with 4.2 boot images. We can find this error in the node's journals logs. It seems to be related to the new composefs change. Thu 2025-03-06 13:21:28 UTC localhost.localdomain rpm-ostreed.service[3455]: Process [pid: 11278 uid: 0 unit: crio-b01113eb0030cac9918424bbb829041651d6d4c354274836778652e2edb06b02.scope] connected to transaction progress Thu 2025-03-06 13:21:28 UTC localhost.localdomain rpm-ostreed.service[3455]: bootfs is sufficient for calculated new size: 0 bytes Thu 2025-03-06 13:21:28 UTC localhost.localdomain rpm-ostreed.service[11291]: /usr/sbin/grub2-probe: error: failed to get canonical path of `composefs'. Thu 2025-03-06 13:21:28 UTC localhost.localdomain rpm-ostreed.service[3455]: Txn Cleanup on /org/projectatomic/rpmostree1/rhcos failed: Bootloader write config: grub2-mkconfig: Child process exited with code 1
- is blocked by
-
RHEL-100702 [9.6.z] Backport patch about `adopt: add tag to install the static GRUB config from tree`
-
- Closed
-
-
COS-3357 Impact statement request for OCPBUGS-52485 Nodes born on 4.1/4.2 will not be able to upgrade to 4.19 due to composefs + os-prober incompatibility
-
- Closed
-
- is cloned by
-
OCPBUGS-59201 [4.20] Nodes born on 4.1/4.2 will not be able to upgrade to 4.19 due to composefs + grub2-probe incompatibility
-
- Verified
-
- relates to
-
RHEL-59866 grub2-probe: error: failed to get canonical path of `overlay' in bootc image
-
- Closed
-
- links to