-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.14.z, 4.15.0, 4.16
-
Quality / Stability / Reliability
-
False
-
-
None
-
Low
-
No
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Now, the logic is that if it cannot get the infrastructure status, it will use the HA values, code: https://github.com/openshift/operator-framework-olm/blob/master/pkg/leaderelection/leaderelection.go#L59-L63
MacBook-Pro:must-gather-sno2 jianzhang$ omg logs package-server-manager-dc4dd8c64-7jw5z |grep "unable to get cluster infrastructure status" 2023-12-30T22:03:30.690296328Z 2023-12-30T22:03:30Z ERROR setup unable to get cluster infrastructure status, using HA cluster values for leader election {"error": "Get \"https://172.30.0.1:443/apis/config.openshift.io/v1/infrastructures/cluster\": context deadline exceeded"}
But, it can get the infrastructure status successfully later, so I'm curious if we can add a retry for it, thanks!
Dec 30 22:47:56.116: INFO: Running 'oc --kubeconfig=/tmp/kubeconfig-3268075730 get lease packageserver-controller-lock -n openshift-operator-lifecycle-manager -o=jsonpath={.spec.leaseDurationSeconds}' ... Dec 30 22:47:56.222: INFO: This is a SNO cluster ... fail [github.com/openshift/openshift-tests-private/test/extended/operators/olm.go:868]: The lease duration is not as expected: 137
The test case: https://github.com/openshift/openshift-tests-private/blob/master/test/extended/operators/olm.go#L803-L822
g.It("NonHyperShiftHOST-Author:jiazha-Medium-49352-SNO Leader election conventions for cluster topology", func() { exutil.By("1) get the cluster topology") infra, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("infrastructures", "cluster", "-o=jsonpath={.status.controlPlaneTopology}").Output() if err != nil { e2e.Failf("Fail to get the cluster infra: %s, error:%v", infra, err) } exutil.By("2) get the leaseDurationSeconds of the packageserver-controller-lock") leaseDurationSeconds, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("lease", "packageserver-controller-lock", "-n", "openshift-operator-lifecycle-manager", "-o=jsonpath={.spec.leaseDurationSeconds}").Output() if err != nil { e2e.Failf("Fail to get the leaseDurationSeconds: %s, error:%v", leaseDurationSeconds, err) } if infra == "SingleReplica" { e2e.Logf("This is a SNO cluster") if !strings.Contains(leaseDurationSeconds, "270") { e2e.Failf("The lease duration is not as expected: %s", leaseDurationSeconds) } } else { g.Skip("This is a HA cluster, skip.") } })
MacBook-Pro:~ jianzhang$ omg get infrastructures cluster -o yaml ... spec: cloudConfig: name: '' platformSpec: type: None status: apiServerInternalURI: https://api-int.ci-op-h2xyljb0.XXXXXXXXXXXXXXXXXXXXXXXXXXXXX:6443 apiServerURL: https://api.ci-op-h2xyljb0.XXXXXXXXXXXXXXXXXXXXXXXXXXXXX:6443 controlPlaneTopology: SingleReplica cpuPartitioning: None etcdDiscoveryDomain: '' infrastructureName: ci-op-h2xyljb0-qshsl infrastructureTopology: SingleReplica platform: None platformStatus: type: None NAME STATUS ROLES AGE VERSION master-00 Ready control-plane,master,worker,wscan 4h29m v1.27.8+4fab27b