Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7903

Pool degraded with error: rpm-ostree kargs: signal: terminated


    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • 4.13.0
    • None
    • Important
    • No
    • MCO Sprint 232
    • 1
    • Rejected
    • False
    • Hide


    • NA

      Description of problem:

      Whenever a MC that needs a reboot is applied to a MachineConfigPool, the pool becomes degraded during the time that the node is rebooting.

      Version-Release number of selected component (if applicable):

      Baremetal IPI dual stack cluster
      FLEXY TEMPLATE: private-templates/functionality-testing/aos-4_13/ipi-on-baremetal/versioned-installer-packet_libvirt-dual_stack-ci
      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.13.0-0.nightly-2023-02-21-014524   True        False         3h2m    Cluster version is 4.13.0-0.nightly-2023-02-21-014524

      How reproducible:

      Very often

      Steps to Reproduce:

      1. Create a MC that needs to reboot the nodes
      2. Eventually (quite often) the MCP will become degraded reporting this error
                      "lastTransitionTime": "2023-02-22T15:44:34Z",
                      "message": "Node worker-0.rioliu-0222c.qe.devcluster.openshift.com is reporting: \"error running rpm-ostree kargs: signal: terminated\\n\"",
                      "reason": "1 nodes are reporting degraded status on sync",
                      "status": "True",
                      "type": "NodeDegraded"
      3. After some mintures (once the node is completely rebooted) the pool stops reporting a degraded status

      Actual results:

      The MachineConfigPool is degraded

      Expected results:

      MachineConfigPools should never report a degraded status with a valid MC

      Additional info:

      It looks like we are executing the  "rpm-ostree kargs" command right after we execute the "systemctl reboot" command.
      17:20:51.570629    4658 update.go:1897] Removing SIGTERM protection   
      17:20:51.570646    4658 update.go:1867] initiating reboot: Node will reboot into config rendered-worker-923735505fa2d7a5811b9c5866c5ad12
      17:20:51.579923    4658 update.go:1867] reboot successful
      17:20:51.582415    4658 daemon.go:518] Transitioned from state: Done -> Working
      17:20:51.582426    4658 daemon.go:523] State and Reason: Working
      17:20:51.609420    4658 rpm-ostree.go:400] Running captured: rpm-ostree kargs
      17:20:51.612228    4658 daemon.go:600] Preflight config drift check failed: error running rpm-ostree kargs: signal: terminated 
      17:20:51.612244    4658 writer.go:200] Marking Degraded due to: error running rpm-ostree kargs: signal: terminated 
      17:20:51.614830    4658 daemon.go:1030] Shutting down MachineConfigDaemon
      We have not seen this problem in other platforms different from baremetal.
      You can find the links to the logs before and after the reboot  in the comments.

            djoshy David Joshy
            sregidor@redhat.com Sergio Regidor de la Rosa
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            0 Vote for this issue
            4 Start watching this issue