Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-62984

MCP is not correctly degraded when a pivotError happens

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      When a pivot error happens (MCO cannot apply a new osImage), the MCP should be degraded and a PivotError alert should be triggered.
      
      We can see the alert being triggered, but the MCP is not correctly degraded, it remains in "working" status instead.
      
      It looks like that eventually, after a good amount of time (15 minutes in our cluster), it is fixed.
      
      {noformat}
      E1013 11:57:13.938404    2596 writer.go:231] Marking Degraded due to: "Failed to update OS to quay.io/mcoqe/layering@sha256:5177a092968e50b2be8d98c15a68bc65016de18dacfc693f99187d2a1457ac85 after retries: timed out waiting for the condition"
      I1013 11:57:13.950238    2596 daemon.go:784] Transitioned from state: Done -> Working
      
      
      I1013 12:12:16.822035    2596 daemon.go:784] Transitioned from state: Working -> Degraded
      {noformat}
      
      
          

      Version-Release number of selected component (if applicable):

      4.20.0-0.nightly-2025-10-13-053645
          

      How reproducible:

      Frequently
          

      Steps to Reproduce:

          1. Break the "rpm-ostree upgrade" using the steps defined in https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-63866
      
          2. Apply a new osImage to the worker pool
          
          

      Actual results:

      The worker pools should be degraded and a pivot alert should be raised. The alert is properly raised but MCO is not able to properly degrade the worker MCP.
      
      
          

      Expected results:

      The worker MCP should be properly degraded with the right message.
          

      Additional info:

      
          

              team-mco Team MCO
              sregidor@redhat.com Sergio Regidor de la Rosa
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: