Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.13.0
Component/s: Machine Config Operator
Labels:
None

Severity:
Important
Regression:
No
Sprint:
MCO Sprint 232
sprint_count:
1
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:
NA
Target Version:

4.13.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

Whenever a MC that needs a reboot is applied to a MachineConfigPool, the pool becomes degraded during the time that the node is rebooting.

Version-Release number of selected component (if applicable):

Baremetal IPI dual stack cluster

FLEXY TEMPLATE: private-templates/functionality-testing/aos-4_13/ipi-on-baremetal/versioned-installer-packet_libvirt-dual_stack-ci

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-02-21-014524   True        False         3h2m    Cluster version is 4.13.0-0.nightly-2023-02-21-014524

How reproducible:

Very often

Steps to Reproduce:

1. Create a MC that needs to reboot the nodes
2. Eventually (quite often) the MCP will become degraded reporting this error
            {
                "lastTransitionTime": "2023-02-22T15:44:34Z",
                "message": "Node worker-0.rioliu-0222c.qe.devcluster.openshift.com is reporting: \"error running rpm-ostree kargs: signal: terminated\\n\"",
                "reason": "1 nodes are reporting degraded status on sync",
                "status": "True",
                "type": "NodeDegraded"
            },
3. After some mintures (once the node is completely rebooted) the pool stops reporting a degraded status

Actual results:

The MachineConfigPool is degraded

Expected results:

MachineConfigPools should never report a degraded status with a valid MC

Additional info:

It looks like we are executing the  "rpm-ostree kargs" command right after we execute the "systemctl reboot" command.

17:20:51.570629    4658 update.go:1897] Removing SIGTERM protection   
17:20:51.570646    4658 update.go:1867] initiating reboot: Node will reboot into config rendered-worker-923735505fa2d7a5811b9c5866c5ad12
17:20:51.579923    4658 update.go:1867] reboot successful
17:20:51.582415    4658 daemon.go:518] Transitioned from state: Done -> Working
17:20:51.582426    4658 daemon.go:523] State and Reason: Working
17:20:51.609420    4658 rpm-ostree.go:400] Running captured: rpm-ostree kargs
17:20:51.612228    4658 daemon.go:600] Preflight config drift check failed: error running rpm-ostree kargs: signal: terminated 
17:20:51.612244    4658 writer.go:200] Marking Degraded due to: error running rpm-ostree kargs: signal: terminated 
17:20:51.614830    4658 daemon.go:1030] Shutting down MachineConfigDaemon


We have not seen this problem in other platforms different from baremetal.

You can find the links to the logs before and after the reboot  in the comments.

links to

openshift/machine-config-operator#3572: OCPBUGS-7903: Pool degraded with error: rpm-ostree kargs: signal: terminated

Assignee:: David Joshy

Reporter:: Sergio Regidor de la Rosa

QA Contact:: Sergio Regidor de la Rosa

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/02/22 6:12 PM

Updated:: 2023/05/17 10:39 PM

Resolved:: 2023/05/17 10:39 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates