Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-45845

When installing an operator OLM locks the Subscription 3-15% of the times [release-4.17]

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • None
    • 4.14.z, 4.15.z, 4.17.z, 4.16.z, 4.18
    • OLM
    • Important
    • None
    • Diglett OLM Sprint 264, Eevee OLM Sprint 265
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the {olm-first} would sometimes concurrently resolve the same namespace in a cluster. This led to subscriptions reaching a terminal state of `ConstraintsNotSatisfiable`, because two concurrent processes interacted with a subscription and this caused a CSV file to become unassociated. With this release, {olm} can resolve concurrent namespaces for a subscription so no CSV remains unassociated. (link:https://issues.redhat.com/browse/OCPBUGS-45845[*OCPBUGS-45845*])
      ====
      Addresses issue where by concurrent reconciliation of the same namespace was leading to erroneous terminal states on Subscriptions
      Show
      * Previously, the {olm-first} would sometimes concurrently resolve the same namespace in a cluster. This led to subscriptions reaching a terminal state of `ConstraintsNotSatisfiable`, because two concurrent processes interacted with a subscription and this caused a CSV file to become unassociated. With this release, {olm} can resolve concurrent namespaces for a subscription so no CSV remains unassociated. (link: https://issues.redhat.com/browse/OCPBUGS-45845 [* OCPBUGS-45845 *]) ==== Addresses issue where by concurrent reconciliation of the same namespace was leading to erroneous terminal states on Subscriptions
    • Bug Fix
    • Done

      Description of problem:

          When installing ROSA/OSD operators OLM "locks up" the Subscription object with "ConstraintsNotSatisfiable" 3-15% of the times, depending on the environment.
      

      Version-Release number of selected component (if applicable):

      Recently tested on:
      - OSD 4.17.5
      - 4.18 nightly (from cluster bot)
      
      Though prevalence across the ROSA fleet suggests this is not a new issue.

      How reproducible:

      Very. This is very prevalent across the OSD/ROSA Classic cluster fleet. Any new OSD/ROSA Classic cluster has a good chance of at least one of its ~12 OSD-specific operators being affected on install time.

      Steps to Reproduce:

          0. Set up a cluster using cluster bot.
          1. Label at least one worker node with node-role.kubernetes.io=infra
          2. Install must gather operator with "oc apply -f mgo.yaml" (file attached)
          3. Wait for the pods to come up.
          4. Start this loop:
      for i in `seq -w 999`; do echo -ne ">>>>>>> $i\t\t"; date; oc get -n openshift-must-gather-operator subscription/must-gather-operator -o yaml >mgo-sub-$i.yaml; oc delete -f mgo.yaml; oc apply -f mgo.yaml; sleep 100; done
          5. Let it run for a few hours.

      Actual results:

      Run "grep ConstraintsNotSatisfiable *.yaml"
       
      You should find a few of the Subscriptions ended up in a "locked" state from which there is no upgrade without manual intervention:
      
        - message: 'constraints not satisfiable: @existing/openshift-must-gather-operator//must-gather-operator.v4.17.281-gd5416c9
            and must-gather-operator-registry/openshift-must-gather-operator/stable/must-gather-operator.v4.17.281-gd5416c9
            originate from package must-gather-operator, subscription must-gather-operator
            requires must-gather-operator-registry/openshift-must-gather-operator/stable/must-gather-operator.v4.17.281-gd5416c9,
            subscription must-gather-operator exists, clusterserviceversion must-gather-operator.v4.17.281-gd5416c9
            exists and is not referenced by a subscription'
          reason: ConstraintsNotSatisfiable
          status: "True"
          type: ResolutionFailed

      Expected results:

          Each installation attempt should've worked fine.

      Additional info:

          

      mgo.yaml:

      apiVersion: v1
      kind: Namespace
      metadata:
        name: openshift-must-gather-operator
        annotations:
          package-operator.run/collision-protection: IfNoController
          package-operator.run/phase: namespaces
          openshift.io/node-selector: ""
        labels:
          openshift.io/cluster-logging: "true"
          openshift.io/cluster-monitoring: 'true'
      ---
      apiVersion: operators.coreos.com/v1alpha1
      kind: CatalogSource
      metadata:
        name: must-gather-operator-registry
        namespace: openshift-must-gather-operator
        annotations:
          package-operator.run/collision-protection: IfNoController
          package-operator.run/phase: must-gather-operator
        labels:
          opsrc-datastore: "true"
          opsrc-provider: redhat
      spec:
        image: quay.io/app-sre/must-gather-operator-registry@sha256:0a0610e37a016fb4eed1b000308d840795838c2306f305a151c64cf3b4fd6bb4
        displayName: must-gather-operator
        icon:
          base64data: ''
          mediatype: ''
        publisher: Red Hat
        sourceType: grpc
        grpcPodConfig:
          securityContextConfig: restricted
          nodeSelector:
            node-role.kubernetes.io: infra
          tolerations:
          - effect: NoSchedule
            key: node-role.kubernetes.io/infra
            operator: Exists
      ---
      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: must-gather-operator
        namespace: openshift-must-gather-operator
        annotations:
          package-operator.run/collision-protection: IfNoController
          package-operator.run/phase: must-gather-operator
      spec:
        channel: stable
        name: must-gather-operator
        source: must-gather-operator-registry
        sourceNamespace: openshift-must-gather-operator
      ---
      apiVersion: operators.coreos.com/v1alpha2
      kind: OperatorGroup
      metadata:
        name: must-gather-operator
        namespace: openshift-must-gather-operator
        annotations:
          package-operator.run/collision-protection: IfNoController
          package-operator.run/phase: must-gather-operator
          olm.operatorframework.io/exclude-global-namespace-resolution: 'true'
      spec:
        targetNamespaces:
        - openshift-must-gather-operator
          

              pegoncal@redhat.com Per Goncalves da Silva
              mmazur@redhat.com Mariusz Mazur
              Kui Wang Kui Wang
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: