Description of problem:
A component must not report Available=False during the course of a normal upgrade.
ClusterOperator olm goes Available=False with reason=CatalogdDeploymentCatalogdControllerManager_Deploying or reason=OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying during updates
Sep 29 04:35:47.504 E clusteroperator/olm condition/Available reason/CatalogdDeploymentCatalogdControllerManager_Deploying status/False CatalogdDeploymentCatalogdControllerManagerAvailable: Waiting for Deployment Sep 29 04:35:47.504 - 52s E clusteroperator/olm condition/Available reason/CatalogdDeploymentCatalogdControllerManager_Deploying status/False CatalogdDeploymentCatalogdControllerManagerAvailable: Waiting for Deployment Sep 29 04:42:35.127 E clusteroperator/olm condition/Available reason/OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying status/False OperatorcontrollerDeploymentOperatorControllerControllerManagerAvailable: Waiting for Deployment Sep 29 04:42:35.127 - 12s E clusteroperator/olm condition/Available reason/OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying status/False OperatorcontrollerDeploymentOperatorControllerControllerManagerAvailable: Waiting for Deployment
Version-Release number of selected component (if applicable):
The issue was spotted with a 4.21 to 4.21 upgrade test.
INFO[2025-09-29T02:33:17Z] Using explicitly provided pull-spec for release initial (registry.ci.openshift.org/ocp/release:4.21.0-0.ci-2025-09-28-082535) INFO[2025-09-29T02:33:17Z] Using explicitly provided pull-spec for release latest (registry.ci.openshift.org/ocp/release:4.21.0-0.ci-2025-09-29-022535)
How reproducible:
Seems always in the aggregated job but there is also a green run in a similar test.
### failure $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.21-e2e-gcp-ovn-upgrade/1972489796022439936/artifacts/e2e-gcp-ovn-upgrade/openshift-e2e-test/artifacts/junit/e2e-monitor-tests__20250929-034333.xml | grep 'clusteroperator/olm should not change condition/Available' -A1 <testcase name="[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/olm should not change condition/Available" time="7014.05639286"> <failure message="">4 unexpected clusteroperator state transitions during e2e test run. These did not match any known exceptions, so they cause this test-case to fail:

Sep 29 04:35:47.504 E clusteroperator/olm condition/Available reason/CatalogdDeploymentCatalogdControllerManager_Deploying status/False CatalogdDeploymentCatalogdControllerManagerAvailable: Waiting for Deployment
Sep 29 04:35:47.504 - 52s E clusteroperator/olm condition/Available reason/CatalogdDeploymentCatalogdControllerManager_Deploying status/False CatalogdDeploymentCatalogdControllerManagerAvailable: Waiting for Deployment
Sep 29 04:42:35.127 E clusteroperator/olm condition/Available reason/OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying status/False OperatorcontrollerDeploymentOperatorControllerControllerManagerAvailable: Waiting for Deployment
Sep 29 04:42:35.127 - 12s E clusteroperator/olm condition/Available reason/OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying status/False OperatorcontrollerDeploymentOperatorControllerControllerManagerAvailable: Waiting for Deployment

2 unwelcome but acceptable clusteroperator state transitions during e2e test run. These should not happen, but because they are tied to exceptions, the fact that they did happen is not sufficient to cause this test-case to fail:

Sep 29 04:36:39.932 W clusteroperator/olm condition/Available reason/AsExpected status/True CatalogdDeploymentCatalogdControllerManagerAvailable: Deployment is available\nOperatorcontrollerDeploymentOperatorControllerControllerManagerAvailable: Deployment is available (exception: Available=True is the happy case)
Sep 29 04:42:48.072 W clusteroperator/olm condition/Available reason/AsExpected status/True CatalogdDeploymentCatalogdControllerManagerAvailable: Deployment is available\nOperatorcontrollerDeploymentOperatorControllerControllerManagerAvailable: Deployment is available (exception: Available=True is the happy case)
</failure> ### success $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/30308/pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade/1971564973029068800/artifacts/e2e-gcp-ovn-upgrade/openshift-e2e-test/artifacts/junit/e2e-monitor-tests__20250926-142805.xml | grep 'clusteroperator/olm should not change condition/Available' -A1 <testcase name="[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/olm should not change condition/Available" time="0"></testcase> <testcase name="[Monitor:legacy-cvo-invariants][bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available" time="0"></testcase>
Steps to Reproduce:
1. Run the aggregated job above 2. 3.
Actual results:
co/olm goes Available=True during the upgrade test.
Expected results:
co/olm stays Available=True during the upgrade test.
Additional info:
The failures were taken from 4.21 to 4.21 upgrade test. It could go with earlier versions too.
- relates to
-
OTA-362 CI: fail update suite if any ClusterOperator go Available=False
-
- Closed
-
- links to