Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7897

OLM stalled on created registryservice

    XMLWordPrintable

Details

    • No
    • Refinement Backlog
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      While running ZTP upgrade of platform and operators on 3449 SNOs, 275 SNOs failed to upgrade their operators because the ZTP subscription-policy was unable to remain noncompliant due to the OLM stalling on updating the registryservice and generating installplans.
      
      It took more than 5m between the time OLM recognized the image change in the catalogsource and the timeframe that the registry service seemed to be updated. During that timeframe the ACM policies which determine if an SNO was compliant with an upgrade became compliant although they had not actually accomplished what we expected. This only occured on ~7.9% of the clusters because it seems there is something stalling OLM's responsiveness to create the registry service.
      
      Example timeline from sno00012:
      2023-02-21T16:27:58Z - platform completes upgrade
      2023-02-21T16:32:19Z - olm catalog operator recognizes the image change for the catalogsource (OLM Logs)
      2023-02-21T16:32:39Z - common-config policy compliant (catalog source should be updated) (hub cluster policy)
      2023-02-21T16:35:05Z - subscription-policy passes as compliant (hub cluster policy)
      2023-02-21T16:37:42Z - registry service updated (catalogsource on SNO)
      2023-02-21T16:37:58Z - catalogsource state ready under connectionstate (catalogsource on SNO)
      2023-02-21T16:37:59Z - LSO 4.12 installplan generated (Installplans on SNO)

      Version-Release number of selected component (if applicable):

      Hub 4.12.2
      SNO 4.11.26 upgrading to 4.12.2
      Operator catalog upgrading from v4.11 to v4.12

      How reproducible:

      275 out of 3449 SNOs ~7.9% of the clusters affected

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

      # oc --kubeconfig=/root/hv-vm/sno/manifests/sno00012/kubeconfig get catalogsources -n openshift-marketplace -o yaml
      apiVersion: v1
      items:
      - apiVersion: operators.coreos.com/v1alpha1
        kind: CatalogSource
        metadata:
          annotations:
            target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
          creationTimestamp: "2023-02-18T22:15:45Z"
          generation: 2
          name: rh-du-operators
          namespace: openshift-marketplace
          resourceVersion: "1200469"
          uid: 1e184788-ffcc-484c-99c3-7e92b75eb055
        spec:
          displayName: disconnected-redhat-operators
          image: e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/olm-mirror/redhat-operator-index:v4.12
          publisher: Red Hat
          sourceType: grpc
          updateStrategy:
            registryPoll:
              interval: 1h
        status:
          connectionState:
            address: rh-du-operators.openshift-marketplace.svc:50051
            lastConnect: "2023-02-21T16:37:58Z"
            lastObservedState: READY
          latestImageRegistryPoll: "2023-02-22T16:26:45Z"
          registryService:
            createdAt: "2023-02-21T16:37:42Z"
            port: "50051"
            protocol: grpc
            serviceName: rh-du-operators
            serviceNamespace: openshift-marketplace
      kind: List
      metadata:
        resourceVersion: ""
      

      Interesting logs from catalog-operator pod

      2023-02-21T16:27:11.098786980+00:00 stderr F time="2023-02-21T16:27:11Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
      2023-02-21T16:31:20.434722204+00:00 stderr F time="2023-02-21T16:31:20Z" level=error msg="UpdateStatus - error while setting CatalogSource status" error="Operation cannot be fulfilled on catalogsources.operators.coreos.com \"rh-du-operators\": the object has been modified; please apply your changes to the latest version and try again" id=dgEjL source=rh-du-operators
      2023-02-21T16:31:20.434798610+00:00 stderr F E0221 16:31:20.434731       1 queueinformer_operator.go:298] sync {"update" "openshift-marketplace/rh-du-operators"} failed: Operation cannot be fulfilled on catalogsources.operators.coreos.com "rh-du-operators": the object has been modified; please apply your changes to the latest version and try again
      2023-02-21T16:31:21.444700182+00:00 stderr F time="2023-02-21T16:31:21Z" level=error msg="UpdateStatus - error while setting CatalogSource status" error="Operation cannot be fulfilled on catalogsources.operators.coreos.com \"rh-du-operators\": the object has been modified; please apply your changes to the latest version and try again" id=acLiz source=rh-du-operators
      2023-02-21T16:31:21.444700182+00:00 stderr F E0221 16:31:21.437301       1 queueinformer_operator.go:298] sync {"update" "openshift-marketplace/rh-du-operators"} failed: Operation cannot be fulfilled on catalogsources.operators.coreos.com "rh-du-operators": the object has been modified; please apply your changes to the latest version and try again
      2023-02-21T16:32:19.025704473+00:00 stderr F time="2023-02-21T16:32:19Z" level=info msg="catalog image changed: serving pod  update pod e27-h01-000-r650.rdu2.scalelab.redhat.com:5000/olm-mirror/redhat-operator-index@sha256:8e021d4474d91d220b840ddbedec4fc5a09c0688873ff08506319352ae87a948" CatalogSource=rh-du-operators-2sk9s
      2023-02-21T16:32:19.083257612+00:00 stderr F time="2023-02-21T16:32:19Z" level=info msg="detected imageID change: catalogsource pod updated at 2023-02-21 16:32:19.081069055 +0000 UTC m=+341.106446887" CatalogSource=rh-du-operators
      2023-02-21T16:32:32.674334511+00:00 stderr F time="2023-02-21T16:32:32Z" level=error msg="UpdateStatus - error while setting CatalogSource status" error="Operation cannot be fulfilled on catalogsources.operators.coreos.com \"rh-du-operators\": the object has been modified; please apply your changes to the latest version and try again" id=vOlk7 source=rh-du-operators
      2023-02-21T16:32:32.674334511+00:00 stderr F E0221 16:32:32.672712       1 queueinformer_operator.go:298] sync {"update" "openshift-marketplace/rh-du-operators"} failed: Operation cannot be fulfilled on catalogsources.operators.coreos.com "rh-du-operators": the object has been modified; please apply your changes to the latest version and try again
      2023-02-21T16:34:46.940926662+00:00 stderr F time="2023-02-21T16:34:46Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
      2023-02-21T16:37:45.437557035+00:00 stderr F time="2023-02-21T16:37:45Z" level=error msg="UpdateStatus - error while setting CatalogSource status" error="Operation cannot be fulfilled on catalogsources.operators.coreos.com \"rh-du-operators\": the object has been modified; please apply your changes to the latest version and try again" id=m5ME5 source=rh-du-operators
      2023-02-21T16:37:45.437627941+00:00 stderr F E0221 16:37:45.437549       1 queueinformer_operator.go:298] sync {"update" "openshift-marketplace/rh-du-operators"} failed: Operation cannot be fulfilled on catalogsources.operators.coreos.com "rh-du-operators": the object has been modified; please apply your changes to the latest version and try again
      2023-02-21T16:37:51.399952281+00:00 stderr F time="2023-02-21T16:37:51Z" level=error msg="UpdateStatus - error while setting CatalogSource status" error="Operation cannot be fulfilled on catalogsources.operators.coreos.com \"rh-du-operators\": the object has been modified; please apply your changes to the latest version and try again" id=5eYMX source=rh-du-operators
      2023-02-21T16:37:51.399952281+00:00 stderr F E0221 16:37:51.399905       1 queueinformer_operator.go:298] sync {"update" "openshift-marketplace/rh-du-operators"} failed: Operation cannot be fulfilled on catalogsources.operators.coreos.com "rh-du-operators": the object has been modified; please apply your changes to the latest version and try again
      2023-02-21T16:37:58.545397685+00:00 stderr F time="2023-02-21T16:37:58Z" level=warning msg="no installplan found with matching generation, creating new one" id=LFvOJ namespace=openshift-sriov-network-operator
      2023-02-21T16:37:58.554631233+00:00 stderr F time="2023-02-21T16:37:58Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
      2023-02-21T16:37:58.554965484+00:00 stderr F time="2023-02-21T16:37:58Z" level=info msg=syncing id=DVEDV ip=install-hnwzr namespace=openshift-sriov-network-operator phase=
      2023-02-21T16:37:58.554965484+00:00 stderr F time="2023-02-21T16:37:58Z" level=info msg="skip processing installplan without status - subscription sync responsible for initial status" id=DVEDV ip=install-hnwzr namespace=openshift-sriov-network-operator phase=
      2023-02-21T16:37:58.703345972+00:00 stderr F time="2023-02-21T16:37:58Z" level=warning msg="no installplan found with matching generation, creating new one" id=Y9795 namespace=openshift-local-storage
      2023-02-21T16:37:58.703426885+00:00 stderr F time="2023-02-21T16:37:58Z" level=info msg=syncing id=hFWA/ ip=install-hnwzr namespace=openshift-sriov-network-operator phase=RequiresApproval
      2023-02-21T16:37:58.703544298+00:00 stderr F time="2023-02-21T16:37:58Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
      2023-02-21T16:37:58.777936348+00:00 stderr F time="2023-02-21T16:37:58Z" level=warning msg="status not equal, updating..." id=hFWA/ ip=install-hnwzr namespace=openshift-sriov-network-operator phase=RequiresApproval
      2023-02-21T16:37:59.113555629+00:00 stderr F time="2023-02-21T16:37:59Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
      2023-02-21T16:37:59.113555629+00:00 stderr F time="2023-02-21T16:37:59Z" level=info msg=syncing id=TKGXy ip=install-jpf5m namespace=openshift-local-storage phase=

       

       

      Attachments

        Issue Links

          Activity

            People

              agreene1991 Alexander Greene
              akrzos@redhat.com Alex Krzos
              Xia Zhao Xia Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: