Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7903

Pool degraded with error: rpm-ostree kargs: signal: terminated

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Undefined
    • None
    • 4.13.0
    • None
    • Important
    • No
    • MCO Sprint 232
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • NA

    Description

      Description of problem:

      Whenever a MC that needs a reboot is applied to a MachineConfigPool, the pool becomes degraded during the time that the node is rebooting.
      
      
      

      Version-Release number of selected component (if applicable):

      Baremetal IPI dual stack cluster
      
      FLEXY TEMPLATE: private-templates/functionality-testing/aos-4_13/ipi-on-baremetal/versioned-installer-packet_libvirt-dual_stack-ci
      
      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.13.0-0.nightly-2023-02-21-014524   True        False         3h2m    Cluster version is 4.13.0-0.nightly-2023-02-21-014524
      
      

      How reproducible:

      Very often
      
      

      Steps to Reproduce:

      1. Create a MC that needs to reboot the nodes
      2. Eventually (quite often) the MCP will become degraded reporting this error
                  {
                      "lastTransitionTime": "2023-02-22T15:44:34Z",
                      "message": "Node worker-0.rioliu-0222c.qe.devcluster.openshift.com is reporting: \"error running rpm-ostree kargs: signal: terminated\\n\"",
                      "reason": "1 nodes are reporting degraded status on sync",
                      "status": "True",
                      "type": "NodeDegraded"
                  },
      3. After some mintures (once the node is completely rebooted) the pool stops reporting a degraded status
      
      

      Actual results:

      The MachineConfigPool is degraded
      

      Expected results:

      MachineConfigPools should never report a degraded status with a valid MC
      
      

      Additional info:

      It looks like we are executing the  "rpm-ostree kargs" command right after we execute the "systemctl reboot" command.
      
      17:20:51.570629    4658 update.go:1897] Removing SIGTERM protection   
      17:20:51.570646    4658 update.go:1867] initiating reboot: Node will reboot into config rendered-worker-923735505fa2d7a5811b9c5866c5ad12
      17:20:51.579923    4658 update.go:1867] reboot successful
      17:20:51.582415    4658 daemon.go:518] Transitioned from state: Done -> Working
      17:20:51.582426    4658 daemon.go:523] State and Reason: Working
      17:20:51.609420    4658 rpm-ostree.go:400] Running captured: rpm-ostree kargs
      17:20:51.612228    4658 daemon.go:600] Preflight config drift check failed: error running rpm-ostree kargs: signal: terminated 
      17:20:51.612244    4658 writer.go:200] Marking Degraded due to: error running rpm-ostree kargs: signal: terminated 
      17:20:51.614830    4658 daemon.go:1030] Shutting down MachineConfigDaemon
      
      
      We have not seen this problem in other platforms different from baremetal.
      
      You can find the links to the logs before and after the reboot  in the comments.
      
      

      Attachments

        Activity

          People

            djoshy David Joshy
            sregidor@redhat.com Sergio Regidor de la Rosa
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: