-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.20
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When a pivot error happens (MCO cannot apply a new osImage), the MCP should be degraded and a PivotError alert should be triggered. We can see the alert being triggered, but the MCP is not correctly degraded, it remains in "working" status instead. It looks like that eventually, after a good amount of time (15 minutes in our cluster), it is fixed. {noformat} E1013 11:57:13.938404 2596 writer.go:231] Marking Degraded due to: "Failed to update OS to quay.io/mcoqe/layering@sha256:5177a092968e50b2be8d98c15a68bc65016de18dacfc693f99187d2a1457ac85 after retries: timed out waiting for the condition" I1013 11:57:13.950238 2596 daemon.go:784] Transitioned from state: Done -> Working I1013 12:12:16.822035 2596 daemon.go:784] Transitioned from state: Working -> Degraded {noformat}
Version-Release number of selected component (if applicable):
4.20.0-0.nightly-2025-10-13-053645
How reproducible:
Frequently
Steps to Reproduce:
1. Break the "rpm-ostree upgrade" using the steps defined in https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-63866 2. Apply a new osImage to the worker pool
Actual results:
The worker pools should be degraded and a pivot alert should be raised. The alert is properly raised but MCO is not able to properly degrade the worker MCP.
Expected results:
The worker MCP should be properly degraded with the right message.
Additional info: