-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
4.12.z
-
No
-
Refinement Backlog
-
1
-
Rejected
-
False
-
Description of problem:
While running ZTP upgrade of platform and operators on 3449 SNOs, 275 SNOs failed to upgrade their operators because the ZTP subscription-policy was unable to remain noncompliant due to the OLM stalling on updating the registryservice and generating installplans. It took more than 5m between the time OLM recognized the image change in the catalogsource and the timeframe that the registry service seemed to be updated. During that timeframe the ACM policies which determine if an SNO was compliant with an upgrade became compliant although they had not actually accomplished what we expected. This only occured on ~7.9% of the clusters because it seems there is something stalling OLM's responsiveness to create the registry service. Example timeline from sno00012: 2023-02-21T16:27:58Z - platform completes upgrade 2023-02-21T16:32:19Z - olm catalog operator recognizes the image change for the catalogsource (OLM Logs) 2023-02-21T16:32:39Z - common-config policy compliant (catalog source should be updated) (hub cluster policy) 2023-02-21T16:35:05Z - subscription-policy passes as compliant (hub cluster policy) 2023-02-21T16:37:42Z - registry service updated (catalogsource on SNO) 2023-02-21T16:37:58Z - catalogsource state ready under connectionstate (catalogsource on SNO) 2023-02-21T16:37:59Z - LSO 4.12 installplan generated (Installplans on SNO)
Version-Release number of selected component (if applicable):
Hub 4.12.2 SNO 4.11.26 upgrading to 4.12.2 Operator catalog upgrading from v4.11 to v4.12
How reproducible:
275 out of 3449 SNOs ~7.9% of the clusters affected
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
# oc --kubeconfig=/root/hv-vm/sno/manifests/sno00012/kubeconfig get catalogsources -n openshift-marketplace -o yaml apiVersion: v1 items: - apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: annotations: target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}' creationTimestamp: "2023-02-18T22:15:45Z" generation: 2 name: rh-du-operators namespace: openshift-marketplace resourceVersion: "1200469" uid: 1e184788-ffcc-484c-99c3-7e92b75eb055 spec: displayName: disconnected-redhat-operators image: e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/olm-mirror/redhat-operator-index:v4.12 publisher: Red Hat sourceType: grpc updateStrategy: registryPoll: interval: 1h status: connectionState: address: rh-du-operators.openshift-marketplace.svc:50051 lastConnect: "2023-02-21T16:37:58Z" lastObservedState: READY latestImageRegistryPoll: "2023-02-22T16:26:45Z" registryService: createdAt: "2023-02-21T16:37:42Z" port: "50051" protocol: grpc serviceName: rh-du-operators serviceNamespace: openshift-marketplace kind: List metadata: resourceVersion: ""
Interesting logs from catalog-operator pod
2023-02-21T16:27:11.098786980+00:00 stderr F time="2023-02-21T16:27:11Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink= 2023-02-21T16:31:20.434722204+00:00 stderr F time="2023-02-21T16:31:20Z" level=error msg="UpdateStatus - error while setting CatalogSource status" error="Operation cannot be fulfilled on catalogsources.operators.coreos.com \"rh-du-operators\": the object has been modified; please apply your changes to the latest version and try again" id=dgEjL source=rh-du-operators 2023-02-21T16:31:20.434798610+00:00 stderr F E0221 16:31:20.434731 1 queueinformer_operator.go:298] sync {"update" "openshift-marketplace/rh-du-operators"} failed: Operation cannot be fulfilled on catalogsources.operators.coreos.com "rh-du-operators": the object has been modified; please apply your changes to the latest version and try again 2023-02-21T16:31:21.444700182+00:00 stderr F time="2023-02-21T16:31:21Z" level=error msg="UpdateStatus - error while setting CatalogSource status" error="Operation cannot be fulfilled on catalogsources.operators.coreos.com \"rh-du-operators\": the object has been modified; please apply your changes to the latest version and try again" id=acLiz source=rh-du-operators 2023-02-21T16:31:21.444700182+00:00 stderr F E0221 16:31:21.437301 1 queueinformer_operator.go:298] sync {"update" "openshift-marketplace/rh-du-operators"} failed: Operation cannot be fulfilled on catalogsources.operators.coreos.com "rh-du-operators": the object has been modified; please apply your changes to the latest version and try again 2023-02-21T16:32:19.025704473+00:00 stderr F time="2023-02-21T16:32:19Z" level=info msg="catalog image changed: serving pod update pod e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/olm-mirror/redhat-operator-index@sha256:8e021d4474d91d220b840ddbedec4fc5a09c0688873ff08506319352ae87a948" CatalogSource=rh-du-operators-2sk9s 2023-02-21T16:32:19.083257612+00:00 stderr F time="2023-02-21T16:32:19Z" level=info msg="detected imageID change: catalogsource pod updated at 2023-02-21 16:32:19.081069055 +0000 UTC m=+341.106446887" CatalogSource=rh-du-operators 2023-02-21T16:32:32.674334511+00:00 stderr F time="2023-02-21T16:32:32Z" level=error msg="UpdateStatus - error while setting CatalogSource status" error="Operation cannot be fulfilled on catalogsources.operators.coreos.com \"rh-du-operators\": the object has been modified; please apply your changes to the latest version and try again" id=vOlk7 source=rh-du-operators 2023-02-21T16:32:32.674334511+00:00 stderr F E0221 16:32:32.672712 1 queueinformer_operator.go:298] sync {"update" "openshift-marketplace/rh-du-operators"} failed: Operation cannot be fulfilled on catalogsources.operators.coreos.com "rh-du-operators": the object has been modified; please apply your changes to the latest version and try again 2023-02-21T16:34:46.940926662+00:00 stderr F time="2023-02-21T16:34:46Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink= 2023-02-21T16:37:45.437557035+00:00 stderr F time="2023-02-21T16:37:45Z" level=error msg="UpdateStatus - error while setting CatalogSource status" error="Operation cannot be fulfilled on catalogsources.operators.coreos.com \"rh-du-operators\": the object has been modified; please apply your changes to the latest version and try again" id=m5ME5 source=rh-du-operators 2023-02-21T16:37:45.437627941+00:00 stderr F E0221 16:37:45.437549 1 queueinformer_operator.go:298] sync {"update" "openshift-marketplace/rh-du-operators"} failed: Operation cannot be fulfilled on catalogsources.operators.coreos.com "rh-du-operators": the object has been modified; please apply your changes to the latest version and try again 2023-02-21T16:37:51.399952281+00:00 stderr F time="2023-02-21T16:37:51Z" level=error msg="UpdateStatus - error while setting CatalogSource status" error="Operation cannot be fulfilled on catalogsources.operators.coreos.com \"rh-du-operators\": the object has been modified; please apply your changes to the latest version and try again" id=5eYMX source=rh-du-operators 2023-02-21T16:37:51.399952281+00:00 stderr F E0221 16:37:51.399905 1 queueinformer_operator.go:298] sync {"update" "openshift-marketplace/rh-du-operators"} failed: Operation cannot be fulfilled on catalogsources.operators.coreos.com "rh-du-operators": the object has been modified; please apply your changes to the latest version and try again 2023-02-21T16:37:58.545397685+00:00 stderr F time="2023-02-21T16:37:58Z" level=warning msg="no installplan found with matching generation, creating new one" id=LFvOJ namespace=openshift-sriov-network-operator 2023-02-21T16:37:58.554631233+00:00 stderr F time="2023-02-21T16:37:58Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink= 2023-02-21T16:37:58.554965484+00:00 stderr F time="2023-02-21T16:37:58Z" level=info msg=syncing id=DVEDV ip=install-hnwzr namespace=openshift-sriov-network-operator phase= 2023-02-21T16:37:58.554965484+00:00 stderr F time="2023-02-21T16:37:58Z" level=info msg="skip processing installplan without status - subscription sync responsible for initial status" id=DVEDV ip=install-hnwzr namespace=openshift-sriov-network-operator phase= 2023-02-21T16:37:58.703345972+00:00 stderr F time="2023-02-21T16:37:58Z" level=warning msg="no installplan found with matching generation, creating new one" id=Y9795 namespace=openshift-local-storage 2023-02-21T16:37:58.703426885+00:00 stderr F time="2023-02-21T16:37:58Z" level=info msg=syncing id=hFWA/ ip=install-hnwzr namespace=openshift-sriov-network-operator phase=RequiresApproval 2023-02-21T16:37:58.703544298+00:00 stderr F time="2023-02-21T16:37:58Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink= 2023-02-21T16:37:58.777936348+00:00 stderr F time="2023-02-21T16:37:58Z" level=warning msg="status not equal, updating..." id=hFWA/ ip=install-hnwzr namespace=openshift-sriov-network-operator phase=RequiresApproval 2023-02-21T16:37:59.113555629+00:00 stderr F time="2023-02-21T16:37:59Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink= 2023-02-21T16:37:59.113555629+00:00 stderr F time="2023-02-21T16:37:59Z" level=info msg=syncing id=TKGXy ip=install-jpf5m namespace=openshift-local-storage phase=
- relates to
-
ACM-3097 Design an ACM policy type for operator installation management
- Closed