-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.18.0
-
Critical
-
None
-
5
-
ETCD Sprint 263
-
1
-
Rejected
-
False
-
While designing a solution to have these rarely run jobs included in component readiness, I discovered the etcd-scaling job is quite broken for some time. The problem seems to be invariant tests looking for "unexpected" things happening in the cluster.
It's possible some or all of these boil down to "this is expected during an etcd scaling operation" if a strong case can be made.
[bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available
This one seems very common, examples:
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-etcd-scaling/1844042416286339072
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-etcd-scaling/1841505588492636160
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-gcp-ovn-etcd-scaling/1831358169155112960
[bz-OLM] clusteroperator/operator-lifecycle-manager-packageserver should not change condition/Available
Examples:
[bz-etcd][invariant] alert/etcdMembersDown should not be at or above info
Examples:
[sig-node] node-lifecycle detects unexpected not ready node [sig-node] node-lifecycle detects unreachable state on node
It's likely more examples could be found here.
A lot to unravel here, but is it acceptable for operators (seemingly several) to go Available=False (a serious condition that would often result in someone getting alerted) during an etcd scaling operation?
Same question for unreachable nodes, and etcd member down alerts.
- relates to
-
OCPBUGS-45672 vertical scaling: should not allow scale-down with only 3 healthy members
- New
-
OCPBUGS-43565 etcd platform pod exist test failing on etcd-scaling jobs
- New
-
OCPBUGS-20062 kube-storage-version-migrator goes Available=False with reason=KubeStorageVersionMigrator_Deploying during updates
- Verified
-
OCPBUGS-44244 Unexpected Node Not Ready Regression
- Verified
-
OCPBUGS-44887 [CI] [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
- Closed
-
OCPBUGS-44892 [CI] [bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available
- Closed