-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.15.0
-
None
-
Moderate
-
No
-
False
-
-
NA
-
Release Note Not Required
-
In Progress
We had this CI job failing because clusteroperator/storage kept flip-flopping between progressing=True and progressing=False
: [sig-arch] events should not repeat pathologically for ns/openshift-cluster-storage-operator expand_less 0s { 1 events happened too frequently event happened 21 times, something is wrong: namespace/openshift-cluster-storage-operator deployment/cluster-storage-operator hmsg/cfc7e5cdbe - reason/OperatorStatusChanged Status for clusteroperator/storage changed: Progressing changed from True to False ("AWSEBSCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well") From: 14:13:20Z To: 14:13:21Z result=reject }
This exposed OCPBUGS-24027 which is now fixed.
However, there are still an excessive number of progressing events from this job.
$ grep 'clusteroperator/storage changed: Progressing' events.txt > progressing.txt $ wc -l progressing.txt 28 progressing.txt
A small subset of those actually change between True and Flase
$ grep 'clusteroperator/storage changed: Progressing' events.txt | grep True openshift-cluster-storage-operator 143m Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing changed from Unknown to False ("All is well"),Available changed from Unknown to True ("DefaultStorageClassControllerAvailable: StorageClass provided by supplied CSI Driver instead of the cluster-storage-operator") openshift-cluster-storage-operator 143m Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing changed from False to True ("AWSEBSProgressing: Waiting for Deployment to act on changes") openshift-cluster-storage-operator 143m Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing message changed from "AWSEBSProgressing: Waiting for Deployment to deploy pods" to "AWSEBSCSIDriverOperatorCRProgressing: Waiting for AWSEBS operator to report status\nAWSEBSProgressing: Waiting for Deployment to deploy pods",Available changed from True to False ("AWSEBSCSIDriverOperatorCRAvailable: Waiting for AWSEBS operator to report status"),Upgradeable changed from Unknown to True ("All is well") openshift-cluster-storage-operator 136m Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing changed from True to False ("AWSEBSCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well") openshift-cluster-storage-operator 45m Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing changed from False to True ("AWSEBSCSIDriverOperatorCRProgressing: AWSEBSDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods") openshift-cluster-storage-operator 2m11s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing changed from True to False ("AWSEBSCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well") openshift-cluster-storage-operator 8m6s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing changed from False to True ("SHARESCSIDriverOperatorCRProgressing: SharedResourcesDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods") openshift-cluster-storage-operator 2m12s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing changed from False to True ("SHARESProgressing: Waiting for Deployment to deploy pods")
But then we end up with events like this for example, where CSO has just appended the status message with more noise between competing controllers:
openshift-cluster-storage-operator 142m Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing message changed from "AWSEBSCSIDriverOperatorCRProgressing: AWSEBSDriverControllerServiceControllerProgressing: Waiting for Deployment to act on changes\nAWSEBSCSIDriverOperatorCRProgressing: AWSEBSDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods\nSHARESCSIDriverOperatorCRProgressing: SharedResourceCSIDriverWebhookControllerProgressing: Waiting for Deployment to deploy pods" to "AWSEBSCSIDriverOperatorCRProgressing: AWSEBSDriverControllerServiceControllerProgressing: Waiting for Deployment to deploy pods\nAWSEBSCSIDriverOperatorCRProgressing: AWSEBSDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods\nSHARESCSIDriverOperatorCRProgressing: SharedResourceCSIDriverWebhookControllerProgressing: Waiting for Deployment to deploy pods",Available message changed from "AWSEBSCSIDriverOperatorCRAvailable: AWSEBSDriverControllerServiceControllerAvailable: Waiting for Deployment\nAWSEBSCSIDriverOperatorCRAvailable: AWSEBSDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service\nSHARESCSIDriverOperatorCRAvailable: SharedResourceCSIDriverWebhookControllerAvailable: Waiting for Deployment\nSHARESCSIDriverOperatorCRAvailable: SharedResourcesDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service" to "AWSEBSCSIDriverOperatorCRAvailable: AWSEBSDriverControllerServiceControllerAvailable: Waiting for Deployment\nSHARESCSIDriverOperatorCRAvailable: SharedResourceCSIDriverWebhookControllerAvailable: Waiting for Deployment\nSHARESCSIDriverOperatorCRAvailable: SharedResourcesDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service"
There are multiple controllers for multiple operators updating the progressing condition, which generates an excessive number of events. This would be (at least) annoying on a live cluster, but it also leaves CSO succeptible to `events should not repeat pathologically` test flakes in CI.
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update