-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
4.15
-
None
-
Moderate
-
No
-
Rejected
-
False
-
Description of problem:
Some Operator CSVs are continuously reconciled, having been detected as NeedsReinstall. Reconciliation fails as OLM attempts to create, not apply, resources that already exist. This causes rapidly flapping status in the dashboard, high CPU load for the OLM pod, and CsvAbnormalOver30Min and CsvAbnormalFailedOver2Min alerts to fire for all affected operators due to NeedsReinstall or InstallComponentFailed (semi-random, depending on what phase of the continuous loop OLM was in when the alert fired).
Version-Release number of selected component (if applicable):
OKD 4.15.0-0.okd-2024-01-27-070424
How reproducible:
Always, with certain operators. This being on OKD, I've got the following results for OLM-managed operators: - ArgoCD (OperatorHub.io catalog) **Not exhibiting** - DevWorkspace Operator (custom devworkspace catalog) **Exhibiting** - Eclipse Che (OKD Community Operators catalog) **Exhibiting** - Grafana Operator (OperatorHub.io catalog) **Not Exhibiting** - KubeVirt Hyperconverged (OperatorHub.io catalog) **Exhibiting** - Crunchy Postgres (OperatorHub.io catalog) **Not Exhibiting**
Steps to Reproduce:
1. Install OKD or OpenShift 4.15 2. Install operators from OperatorHub or using the OLM APIs 3. Install a mix of operators that have webhook definitions in their CSV and those that don't
Actual results:
Some operators transition to Succeeded status and stay there while others loop continuously through Failed, Pending, and InstallReady.
Expected results:
All operators install and OLM stops reconciling.
Additional info:
Since this is OKD, I'll just attach a must-gather here:
https://s3.jharmison.com/public/must-gather-cleaned.tgz
This cluster is public and I can let anyone log in and poke around if you think it would be helpful, or can go collect any extra logs or anything you like. I only know enough about OLM internals to be dangerous.
Here is a snippet of the OLM logs during this time period:
2024-01-31T21:07:33.737245535Z {"level":"error","ts":"2024-01-31T21:07:33Z","logger":"controllers.operator","msg":"Could not update Operator status","request":{"name":"eclipse-che.openshift-operators"},"error":"Ope ration cannot be fulfilled on operators.operators.coreos.com \"eclipse-che.openshift-operators\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.co m/operator-framework/operator-lifecycle-manager/pkg/controller/operators.(*OperatorReconciler).Reconcile\n\t/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/operator_c ontroller.go:157\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-run time/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Contro ller).processNextWorkItem\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/build/vend or/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226"} 2024-01-31T21:07:33.946396443Z E0131 21:07:33.946362 1 queueinformer_operator.go:319] sync {"update" "openshift-operators/eclipse-che.v7.80.0"} failed: rolebindings.rbac.authorization.k8s.io "che-operator-ser vice-auth-reader" already exists 2024-01-31T21:07:34.219414659Z time="2024-01-31T21:07:34Z" level=info msg="scheduling ClusterServiceVersion for install" csv=devworkspace-operator.v0.25.0 id=ENH8l namespace=openshift-operators phase=Pending 2024-01-31T21:07:34.219573625Z I0131 21:07:34.219544 1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"openshift-operators", Name:"devworkspace-operator.v0.25.0", UID:"4d42c03f -837b-4008-ad59-00fbb6f13c87", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"863852735", FieldPath:""}): type: 'Normal' reason: 'AllRequirementsMet' all requirements found, attempting install 2024-01-31T21:07:35.062259438Z time="2024-01-31T21:07:35Z" level=warning msg="needs reinstall: webhooks not installed" csv=kubevirt-hyperconverged-operator.v1.10.1 id=ZGSEA namespace=kubevirt-hyperconverged phase=F ailed strategy=deployment 2024-01-31T21:07:35.062363864Z I0131 21:07:35.062286 1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"kubevirt-hyperconverged", Name:"kubevirt-hyperconverged-operator.v1.10.1" , UID:"7d9ddf57-8d63-4a8d-a20f-86a1884709aa", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"863852737", FieldPath:""}): type: 'Normal' reason: 'NeedsReinstall' webhooks not installed 2024-01-31T21:07:35.121364568Z I0131 21:07:35.121328 1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"openshift-operators", Name:"eclipse-che.v7.80.0", UID:"efdefaa8-1ba4-4fb5 -ae6e-05fc6c9a051a", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"863852772", FieldPath:""}): type: 'Normal' reason: 'NeedsReinstall' calculated deployment install is bad 2024-01-31T21:07:35.683718848Z time="2024-01-31T21:07:35Z" level=warning msg="reusing existing cert devworkspace-controller-manager-service-cert" 2024-01-31T21:07:35.793138602Z time="2024-01-31T21:07:35Z" level=warning msg="could not create auth reader role binding devworkspace-controller-manager-service-auth-reader" 2024-01-31T21:07:35.793329438Z I0131 21:07:35.793304 1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"openshift-operators", Name:"devworkspace-operator.v0.25.0", UID:"4d42c03f -837b-4008-ad59-00fbb6f13c87", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"863852784", FieldPath:""}): type: 'Warning' reason: 'InstallComponentFailed' install strategy failed: rolebindings.rbac.au thorization.k8s.io "devworkspace-controller-manager-service-auth-reader" already exists 2024-01-31T21:07:35.793669069Z {"level":"error","ts":"2024-01-31T21:07:35Z","logger":"controllers.operator","msg":"Could not update Operator status","request":{"name":"devworkspace-operator.openshift-operators"},"e rror":"Operation cannot be fulfilled on operators.operators.coreos.com \"devworkspace-operator.openshift-operators\": the object has been modified; please apply your changes to the latest version and try again","st acktrace":"github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators.(*OperatorReconciler).Reconcile\n\t/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/ operators/operator_controller.go:157\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k 8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal /controller.(*Controller).processNextWorkItem\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.fu nc2.2\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226"} 2024-01-31T21:07:35.961587314Z E0131 21:07:35.961556 1 queueinformer_operator.go:319] sync {"update" "openshift-operators/devworkspace-operator.v0.25.0"} failed: rolebindings.rbac.authorization.k8s.io "devwor kspace-controller-manager-service-auth-reader" already exists 2024-01-31T21:07:36.418866061Z time="2024-01-31T21:07:36Z" level=info msg="scheduling ClusterServiceVersion for install" csv=eclipse-che.v7.80.0 id=OijjL namespace=openshift-operators phase=Pending 2024-01-31T21:07:36.418932412Z I0131 21:07:36.418903 1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"openshift-operators", Name:"eclipse-che.v7.80.0", UID:"efdefaa8-1ba4-4fb5 -ae6e-05fc6c9a051a", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"863852808", FieldPath:""}): type: 'Normal' reason: 'AllRequirementsMet' all requirements found, attempting install 2024-01-31T21:07:37.222110856Z time="2024-01-31T21:07:37Z" level=info msg="scheduling ClusterServiceVersion for install" csv=kubevirt-hyperconverged-operator.v1.10.1 id=K1PkY namespace=kubevirt-hyperconverged phase =Pending 2024-01-31T21:07:37.222218355Z I0131 21:07:37.222184 1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"kubevirt-hyperconverged", Name:"kubevirt-hyperconverged-operator.v1.10.1" , UID:"7d9ddf57-8d63-4a8d-a20f-86a1884709aa", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"863852806", FieldPath:""}): type: 'Normal' reason: 'AllRequirementsMet' all requirements found, attempting install 2024-01-31T21:07:37.530752857Z time="2024-01-31T21:07:37Z" level=warning msg="needs reinstall: missing deployment with name=devworkspace-controller-manager" csv=devworkspace-operator.v0.25.0 id=bJrRG namespace=open shift-operators phase=Failed strategy=deployment 2024-01-31T21:07:37.530927320Z I0131 21:07:37.530899 1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"openshift-operators", Name:"devworkspace-operator.v0.25.0", UID:"4d42c03f -837b-4008-ad59-00fbb6f13c87", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"863852843", FieldPath:""}): type: 'Normal' reason: 'NeedsReinstall' installing: missing deployment with name=devworkspace- controller-manager 2024-01-31T21:07:37.944687196Z time="2024-01-31T21:07:37Z" level=warning msg="reusing existing cert che-operator-service-cert" 2024-01-31T21:07:38.057979414Z time="2024-01-31T21:07:38Z" level=warning msg="could not create auth reader role binding che-operator-service-auth-reader" 2024-01-31T21:07:38.058233728Z I0131 21:07:38.058078 1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"openshift-operators", Name:"eclipse-che.v7.80.0", UID:"efdefaa8-1ba4-4fb5 -ae6e-05fc6c9a051a", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"863852863", FieldPath:""}): type: 'Warning' reason: 'InstallComponentFailed' install strategy failed: rolebindings.rbac.authorizatio n.k8s.io "che-operator-service-auth-reader" already exists 2024-01-31T21:07:38.238593871Z E0131 21:07:38.238566 1 queueinformer_operator.go:319] sync {"update" "openshift-operators/eclipse-che.v7.80.0"} failed: rolebindings.rbac.authorization.k8s.io "che-operator-ser vice-auth-reader" already exists 2024-01-31T21:07:38.519712490Z time="2024-01-31T21:07:38Z" level=info msg="scheduling ClusterServiceVersion for install" csv=devworkspace-operator.v0.25.0 id=AMYTI namespace=openshift-operators phase=Pending 2024-01-31T21:07:38.519875885Z I0131 21:07:38.519846 1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"openshift-operators", Name:"devworkspace-operator.v0.25.0", UID:"4d42c03f -837b-4008-ad59-00fbb6f13c87", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"863852895", FieldPath:""}): type: 'Normal' reason: 'AllRequirementsMet' all requirements found, attempting install 2024-01-31T21:07:38.570067396Z time="2024-01-31T21:07:38Z" level=info msg="No api or webhook descs to add CA to" 2024-01-31T21:07:38.626582597Z time="2024-01-31T21:07:38Z" level=warning msg="reusing existing cert hco-webhook-service-cert" 2024-01-31T21:07:38.739072314Z time="2024-01-31T21:07:38Z" level=warning msg="could not create auth reader role binding hco-webhook-service-auth-reader" 2024-01-31T21:07:38.739547134Z I0131 21:07:38.739508 1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"kubevirt-hyperconverged", Name:"kubevirt-hyperconverged-operator.v1.10.1" , UID:"7d9ddf57-8d63-4a8d-a20f-86a1884709aa", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"863852893", FieldPath:""}): type: 'Warning' reason: 'InstallComponentFailed' install strategy failed: roleb indings.rbac.authorization.k8s.io "hco-webhook-service-auth-reader" already exists 2024-01-31T21:07:39.117949608Z I0131 21:07:39.117908 1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"openshift-operators", Name:"eclipse-che.v7.80.0", UID:"efdefaa8-1ba4-4fb5 -ae6e-05fc6c9a051a", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"863852934", FieldPath:""}): type: 'Normal' reason: 'NeedsReinstall' calculated deployment install is bad 2024-01-31T21:07:39.124653014Z E0131 21:07:39.124612 1 queueinformer_operator.go:319] sync {"update" "kubevirt-hyperconverged/kubevirt-hyperconverged-operator.v1.10.1"} failed: rolebindings.rbac.authorization .k8s.io "hco-webhook-service-auth-reader" already exists
- duplicates
-
OCPBUGS-31479 Installed Operators in "Failed" status after upgrading to 4.15.3
- Closed