Resolution: Done
BU Product Work
OCPSTRAT-835 - Improve upgrades - Reduce False Positives status from operators
OTA 243, OTA 244, OTA 245
These are alarming conditions which may frighten customers, and we don't want to see them in our own, controlled, repeatable update CI. This example job had logs like:
Feb 18 21:11:25.799 E clusteroperator/openshift-apiserver changed Degraded to True: APIServerDeployment_UnavailablePod: APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()
And the job failed, but none of the failures were "something made openshift-apiserver mad enough to go Degraded".
- blocks
OTA-980 Is the Failing=True status condition is a good indicator for admins?
- To Do
- is blocked by
OTA-701 Communicate available and degraded condition definition to teams
- Closed
- is depended on by
TRT-1576 CI: fail update suite if any ClusterOperator go Available=False outside of updates
- Closed
- is related to
OCPBUGS-20056 Single short-lived operand blip shouldn't cause authentication operator Available=False
- New
OCPBUGS-23746 openshift-apiserver ClusterOperator should not blip Available=False on brief missing HTTP content-type
- New
OCPBUGS-20062 kube-storage-version-migrator goes Available=False with reason=KubeStorageVersionMigrator_Deploying during updates
- Verified
OCPBUGS-9108 openshift-tests-upgrade.[bz-Machine Config Operator] clusteroperator/machine-config should not change condition/Available
- Closed
OCPBUGS-20061 control-plane-machine-set goes Available=False with UnavailableReplicas during updates
- Closed
OCPBUGS-23744 operator-lifecycle-manager-packageserver ClusterOperator should not blip Available=False on 4.14 to 4.15 updates
- Closed
OCPBUGS-24041 Console blips Available=False with RouteHealth_FailedGet and such
- Closed
OCPBUGS-24228 machine-config ClusterOperator should not blip Available=False on brief missing HTTP content-type
- Closed
OCPBUGS-32089 Authentication blips Available=False with WellKnown_NotReady
- Closed
OCPBUGS-36462 control-plane-machine-set goes Available=False with UnavailableReplicas during etcd scale testing
- Closed
TRT-1235 Work with Service Delivery On Reducing Problematic Alerts
- Closed
TRT-1575 CI: fail update suite if any ClusterOperator go Degraded=True
- Closed
OCPBUGS-825 Available=False with no reason
- Closed
OCPBUGS-22364 ControllerCertificate struct validation failed during upgrade from 4.14 to 4.15
- Closed
OCPBUGS-23745 monitoring ClusterOperator should not blip Available=False on quick etcd leader changes
- Closed
OCPBUGS-35892 monitoring ClusterOperator should not blip Available=Unknown on client rate limiter
- Closed
- links to
I've set all related bugs to priority Major and left comments indicating we'd like to have these addressed by 4.16.