-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
4.13.0
-
Moderate
-
No
-
Rejected
-
False
-
Description of problem:
Cluster operator operator-lifecycle-manager is not available.
The cluster kubeconfig: http://lacrosse.corp.redhat.com/~msimka/kubeconfig
Must gather: http://lacrosse.corp.redhat.com/~msimka/must-gather.local.8964562753497039482.zip
MacBook-Pro:~ jianzhang$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0 True False 12d Error while reconciling 4.13.0: the cluster operator operator-lifecycle-manager is not available MacBook-Pro:~ jianzhang$ oc get clusterversion version -o yaml apiVersion: config.openshift.io/v1 kind: ClusterVersion metadata: creationTimestamp: "2023-05-18T10:37:13Z" generation: 2 name: version resourceVersion: "10512962" uid: 77cd00f5-efd4-41f7-9e64-10c90a75b16d spec: channel: stable-4.13 clusterID: 5d845f4f-a80c-43db-810d-60a3f0bad847 status: availableUpdates: null capabilities: enabledCapabilities: - CSISnapshot - Console - Insights - NodeTuning - Storage - baremetal - marketplace - openshift-samples knownCapabilities: - CSISnapshot - Console - Insights - NodeTuning - Storage - baremetal - marketplace - openshift-samples conditions: - lastTransitionTime: "2023-05-18T10:37:16Z" status: "True" type: RetrievedUpdates - lastTransitionTime: "2023-05-18T10:37:16Z" message: Capabilities match configured spec reason: AsExpected status: "False" type: ImplicitlyEnabledCapabilities - lastTransitionTime: "2023-05-18T10:37:16Z" message: Payload loaded version="4.13.0" image="quay.io/openshift-release-dev/ocp-release@sha256:74b23ed4bbb593195a721373ed6693687a9b444c97065ce8ac653ba464375711" architecture="amd64" reason: PayloadLoaded status: "True" type: ReleaseAccepted - lastTransitionTime: "2023-05-18T11:01:08Z" message: Done applying 4.13.0 status: "True" type: Available - lastTransitionTime: "2023-05-31T07:20:17Z" message: Cluster operator operator-lifecycle-manager is not available reason: ClusterOperatorNotAvailable status: "True" type: Failing - lastTransitionTime: "2023-05-18T11:01:08Z" message: 'Error while reconciling 4.13.0: the cluster operator operator-lifecycle-manager is not available' reason: ClusterOperatorNotAvailable status: "False" type: Progressing - lastTransitionTime: "2023-05-30T09:05:45Z" message: 'Cluster operator operator-lifecycle-manager should not be upgraded between minor versions: Waiting for updates to take effect' status: "False" type: Upgradeable desired: channels: - candidate-4.13 - candidate-4.14 - fast-4.13 - stable-4.13 image: quay.io/openshift-release-dev/ocp-release@sha256:74b23ed4bbb593195a721373ed6693687a9b444c97065ce8ac653ba464375711 url: https://access.redhat.com/errata/RHSA-2023:1326 version: 4.13.0 history: - completionTime: "2023-05-18T11:01:08Z" image: quay.io/openshift-release-dev/ocp-release@sha256:74b23ed4bbb593195a721373ed6693687a9b444c97065ce8ac653ba464375711 startedTime: "2023-05-18T10:37:16Z" state: Completed verified: false version: 4.13.0 observedGeneration: 2 versionHash: RlBJDtEl6wk= MacBook-Pro:~ jianzhang$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.13.0 True False False 151m baremetal 4.13.0 True False False 12d cloud-controller-manager 4.13.0 True False False 12d cloud-credential 4.13.0 True False False 12d cluster-autoscaler 4.13.0 True False False 12d config-operator 4.13.0 True False False 12d console 4.13.0 True False False 15h control-plane-machine-set 4.13.0 True False False 12d csi-snapshot-controller 4.13.0 True False False 15h dns 4.13.0 True False False 15h etcd 4.13.0 True False False 12d image-registry 4.13.0 True False False 15h ingress 4.13.0 True False False 15h insights 4.13.0 True False False 12d kube-apiserver 4.13.0 True False False 12d kube-controller-manager 4.13.0 True False False 12d kube-scheduler 4.13.0 True False False 12d kube-storage-version-migrator 4.13.0 True False False 15h machine-api 4.13.0 True False False 12d machine-approver 4.13.0 True False False 12d machine-config 4.13.0 True False False 151m marketplace 4.13.0 True False False 12d monitoring 4.13.0 True False False 15h network 4.13.0 True False False 12d node-tuning 4.13.0 True False False 12d openshift-apiserver 4.13.0 True False False 152m openshift-controller-manager 4.13.0 True False False 15h openshift-samples 4.13.0 True False False 12d operator-lifecycle-manager 4.13.0 False True True 23h operator-lifecycle-manager-catalog 4.13.0 True False False 12d operator-lifecycle-manager-packageserver 4.13.0 True False False 12d service-ca 4.13.0 True False False 12d storage 4.13.0 True False False 12d
Version-Release number of selected component (if applicable):
Cluster version is 4.13.0
How reproducible:
not always
Steps to Reproduce:
No manual reproduction steps, refer to: https://redhat-internal.slack.com/archives/CH76YSYSC/p1685457880394859
1. Install OCP 4.13.0 2.
Actual results:
OLM cannot recover and report the reay status to MCO
Expected results:
OLM can recover and report the ready status to MCO after failing to connect to master
Additional info:
I checked catalog-operator previous log and found the OLM cannot connect to the master. But, after a while, the catalog-operator pod was recreated, and re-connect to the master succeeded.
time="2023-05-30T09:03:01Z" level=info msg="log level info" time="2023-05-30T09:03:01Z" level=info msg="TLS keys set, using https for metrics" W0530 09:03:01.307617 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. time="2023-05-30T09:03:01Z" level=info msg="Using in-cluster kube client config" time="2023-05-30T09:03:01Z" level=info msg="Using in-cluster kube client config" W0530 09:03:01.308467 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. Error: error configuring catalog operator: Get "https://172.122.0.1:443/api?timeout=32s": dial tcp 172.122.0.1:443: connect: connection refused error configuring catalog operator: Get "https://172.122.0.1:443/api?timeout=32s": dial tcp 172.122.0.1:443: connect: connection refused
The reason why it didn't report the ready status to MCO, I guess is that the error report "no operator group found that is managing this namespace" is too frequent almost twice every second.
time="2023-05-31T05:58:47Z" level=info msg=syncing id=4o6rT ip=install-gq2x7 namespace=appsint-8x6y phase=Installing time="2023-05-31T05:58:47Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink= time="2023-05-31T05:58:47Z" level=info msg="attenuated service account query failed - no operator group found that is managing this namespace" id=4o6rT ip=install-gq2x7 namespace=appsint-8x6y phase=Installing E0531 05:58:47.665719 1 queueinformer_operator.go:298] sync {"update" "appsint-g27n/install-svkrf"} failed: attenuated service account query failed - no operator group found that is managing this namespace time="2023-05-31T05:58:47Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink= time="2023-05-31T05:58:47Z" level=info msg=syncing id=HCPoY ip=install-b98z8 namespace=mchoma-appsint phase=Installing time="2023-05-31T05:58:47Z" level=warning msg="skipping operator group since it is not managing any namespace og=mchoma-appsint-sph4z" mode=scoped namespace=mchoma-appsint time="2023-05-31T05:58:47Z" level=info msg="attenuated service account query failed - no operator group found that is managing this namespace" id=HCPoY ip=install-b98z8 namespace=mchoma-appsint phase=Installing
Workaround, OLM reported the ready status after recreating its pods manually.
MacBook-Pro:~ jianzhang$ oc delete pods --all -n openshift-operator-lifecycle-manager MacBook-Pro:~ jianzhang$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0 True False 12d Cluster version is 4.13.0 MacBook-Pro:~ jianzhang$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.13.0 True False False 3h20m baremetal 4.13.0 True False False 12d cloud-controller-manager 4.13.0 True False False 12d cloud-credential 4.13.0 True False False 12d cluster-autoscaler 4.13.0 True False False 12d config-operator 4.13.0 True False False 12d console 4.13.0 True False False 16h control-plane-machine-set 4.13.0 True False False 12d csi-snapshot-controller 4.13.0 True False False 16h dns 4.13.0 True False False 16h etcd 4.13.0 True False False 12d image-registry 4.13.0 True False False 16h ingress 4.13.0 True False False 16h insights 4.13.0 True False False 12d kube-apiserver 4.13.0 True False False 12d kube-controller-manager 4.13.0 True False False 12d kube-scheduler 4.13.0 True False False 12d kube-storage-version-migrator 4.13.0 True False False 16h machine-api 4.13.0 True False False 12d machine-approver 4.13.0 True False False 12d machine-config 4.13.0 True False False 3h20m marketplace 4.13.0 True False False 12d monitoring 4.13.0 True False False 16h network 4.13.0 True False False 12d node-tuning 4.13.0 True False False 12d openshift-apiserver 4.13.0 True False False 3h20m openshift-controller-manager 4.13.0 True False False 15h openshift-samples 4.13.0 True False False 12d operator-lifecycle-manager 4.13.0 True False False 46m operator-lifecycle-manager-catalog 4.13.0 True False False 12d operator-lifecycle-manager-packageserver 4.13.0 True False False 12d service-ca 4.13.0 True False False 12d storage 4.13.0 True False False 12d