[OCPBUGS-24057] storage ClusterOperator flops Progressing reason in tech-preview serial jobs - Red Hat Issue Tracker

Type: Bug
Resolution: Duplicate
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.15
Component/s: Storage / Operators
Labels:
None

Severity:
Moderate
Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Seen in a 4.15 tech-preview serial CI run:

: [sig-arch] events should not repeat pathologically for ns/openshift-cluster-storage-operator 	0s
{  1 events happened too frequently

event happened 22 times, something is wrong: namespace/openshift-cluster-storage-operator deployment/cluster-storage-operator hmsg/cfc7e5cdbe - reason/OperatorStatusChanged Status for clusteroperator/storage changed: Progressing changed from True to False ("AWSEBSCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well") From: 17:20:04Z To: 17:20:05Z result=reject }

Seems pretty common:

$ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=SHARESCSIDriverOperatorCRProgressing&maxAge=24h&type=junit&name=periodic' | grep 'failures match'
periodic-ci-openshift-release-master-ci-4.15-e2e-aws-sdn-techpreview-serial (all) - 4 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-techpreview-serial (all) - 4 runs, 75% failed, 100% of failures match = 75% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-azure-sdn-techpreview-serial (all) - 4 runs, 75% failed, 33% of failures match = 25% impact
periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-sdn-techpreview-serial (all) - 4 runs, 50% failed, 50% of failures match = 25% impact

Looking at PromeCIeus for the run I dug into, the reason flipping seems to be between AWSEBSCSIDriverOperatorCR_AWSEBSDriverNodeServiceController_Deploying and AWSEBSCSIDriverOperatorCR_AWSEBSDriverNodeServiceController_Deploying::SHARESCSIDriverOperatorCR_SharedResourcesDriverNodeServiceController_Deploying. Possibly the SHARESCSIDriverOperatorCR_SharedResourcesDriverNodeServiceController_Deploying side needs some inertia to avoid coming in and out?

Alternatively, if the reason churn seems appropriate, maybe the origin test suite can be taught that this churn is expected, and not pathological?

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

PromeCIeus metrics.png
111 kB
2023/11/28 9:56 PM

duplicates

OCPBUGS-24027 forbidden access to resource on shared-resource-csi-driver-operator

Closed

Assignee:: Unassigned

Reporter:: W. Trevor King

QA Contact:: Wei Duan

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2023/11/28 9:54 PM

Updated:: 2023/11/29 7:13 PM

Resolved:: 2023/11/29 3:34 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates