-
Bug
-
Resolution: Done
-
Undefined
-
None
-
4.13.0
-
None
-
Important
-
No
-
MCO Sprint 232
-
1
-
Rejected
-
False
-
-
NA
Description of problem:
Whenever a MC that needs a reboot is applied to a MachineConfigPool, the pool becomes degraded during the time that the node is rebooting.
Version-Release number of selected component (if applicable):
Baremetal IPI dual stack cluster FLEXY TEMPLATE: private-templates/functionality-testing/aos-4_13/ipi-on-baremetal/versioned-installer-packet_libvirt-dual_stack-ci $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-0.nightly-2023-02-21-014524 True False 3h2m Cluster version is 4.13.0-0.nightly-2023-02-21-014524
How reproducible:
Very often
Steps to Reproduce:
1. Create a MC that needs to reboot the nodes 2. Eventually (quite often) the MCP will become degraded reporting this error { "lastTransitionTime": "2023-02-22T15:44:34Z", "message": "Node worker-0.rioliu-0222c.qe.devcluster.openshift.com is reporting: \"error running rpm-ostree kargs: signal: terminated\\n\"", "reason": "1 nodes are reporting degraded status on sync", "status": "True", "type": "NodeDegraded" }, 3. After some mintures (once the node is completely rebooted) the pool stops reporting a degraded status
Actual results:
The MachineConfigPool is degraded
Expected results:
MachineConfigPools should never report a degraded status with a valid MC
Additional info:
It looks like we are executing the "rpm-ostree kargs" command right after we execute the "systemctl reboot" command. 17:20:51.570629 4658 update.go:1897] Removing SIGTERM protection 17:20:51.570646 4658 update.go:1867] initiating reboot: Node will reboot into config rendered-worker-923735505fa2d7a5811b9c5866c5ad12 17:20:51.579923 4658 update.go:1867] reboot successful 17:20:51.582415 4658 daemon.go:518] Transitioned from state: Done -> Working 17:20:51.582426 4658 daemon.go:523] State and Reason: Working 17:20:51.609420 4658 rpm-ostree.go:400] Running captured: rpm-ostree kargs 17:20:51.612228 4658 daemon.go:600] Preflight config drift check failed: error running rpm-ostree kargs: signal: terminated 17:20:51.612244 4658 writer.go:200] Marking Degraded due to: error running rpm-ostree kargs: signal: terminated 17:20:51.614830 4658 daemon.go:1030] Shutting down MachineConfigDaemon We have not seen this problem in other platforms different from baremetal. You can find the links to the logs before and after the reboot in the comments.