Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-570

Operator objects are re-created even after deleting it

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • 4.9.z
    • OLM
    • None
    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None

      This is a clone of https://bugzilla.redhat.com/show_bug.cgi?id=2106838 for backporting purposes

       
      +++ This bug was initially created as a clone of
      Bug #2015023
      +++

      Description of problem:
      Tried to uninstall the operator and it worked. However, the operator custom resource doesn't gets deleted.
      Strangely, it is getting recreated even after issuing "oc delete operator <name>"

      Version-Release number of selected component (if applicable):
      4.7.31

      How reproducible:
      100%

      Steps to Reproduce:
      1. Install any operator
      2. Check operator resource
      3. Uninstall the operator
      4. Try to delete the "operator" resource.
      5. List the operator resource. The operator resource will get recreated.

      Actual results:
      The operator resource is not getting removed even after deleting it.

      Expected results:
      After issuing "oc delete operator <name>", the operator resource should get removed.

      Additional info:
      Similar bug[1] was fix in 4.7.0 but looks like the issue is still there.

      [1]
      https://bugzilla.redhat.com/show_bug.cgi?id=1899588
      — Additional comment from Dhruv Gautam on 2021-10-18 09:14:25 UTC —

      Hi Team

      Let me know if you need any logs.

      Regards
      Dhruv Gautam

      — Additional comment from Nick Hale on 2021-12-02 20:09:45 UTC —

      Sorry for the slow response!

      @
      dgautam@redhat.com
      I'm going to need the status of the Operator resource after the deletion attempt is made. That status should show any remaining components. Without that info, I can only suspect that some cluster scoped resources still exist that reference the Operator – e.g. a CRD – since I have been unable to reproduce the issue myself.

      A must-gather will help too.

      — Additional comment from Dhruv Gautam on 2021-12-03 12:45:05 UTC —

      Hi Nick

      Must-gather is available in below google drive:
      [-]
      https://drive.google.com/file/d/1DtGkIzZpWYjihu_kG0OR4BkLFjIsPlFB/view?usp=sharing
      If required, we can try to reproduce the issue together.

      Regards
      Dhruv Gautam

      — Additional comment from Nick Hale on 2021-12-03 15:53:47 UTC —

      Looking at `cluster-scoped-resources/operators.coreos.com/operators/cloud-native-postgresql.sandbox-kokj.yaml` in the must-gather shows that there are still resources related to the Operator on the cluster:

      ```

      • apiVersion: rbac.authorization.k8s.io/v1
        kind: RoleBinding
        name: postgresql-operator-controller-manager-1-9-1-service-auth-reader
        namespace: kube-system
      • apiVersion: apiextensions.k8s.io/v1
        conditions:
      • lastTransitionTime: "2021-10-06T10:29:10Z"
        message: no conflicts found
        reason: NoConflicts
        status: "True"
        type: NamesAccepted
      • lastTransitionTime: "2021-10-06T10:29:11Z"
        message: the initial names have been accepted
        reason: InitialNamesAccepted
        status: "True"
        type: Established
        kind: CustomResourceDefinition
        name: clusters.postgresql.k8s.enterprisedb.io
      • apiVersion: apiextensions.k8s.io/v1
        conditions:
      • lastTransitionTime: "2021-10-06T10:29:10Z"
        message: no conflicts found
        reason: NoConflicts
        status: "True"
        type: NamesAccepted
      • lastTransitionTime: "2021-10-06T10:29:10Z"
        message: the initial names have been accepted
        reason: InitialNamesAccepted
        status: "True"
        type: Established
        kind: CustomResourceDefinition
        name: backups.postgresql.k8s.enterprisedb.io
      • apiVersion: apiextensions.k8s.io/v1
        conditions:
      • lastTransitionTime: "2021-10-06T10:29:11Z"
        message: no conflicts found
        reason: NoConflicts
        status: "True"
        type: NamesAccepted
      • lastTransitionTime: "2021-10-06T10:29:11Z"
        message: the initial names have been accepted
        reason: InitialNamesAccepted
        status: "True"
        type: Established
        kind: CustomResourceDefinition
        name: scheduledbackups.postgresql.k8s.enterprisedb.io
        ```

      These must be deleted before the Operator resource can be.

      I'm now fairly confident that this is not a bug. If you delete these resources then the Operator resource and it is still recreated, then please reopen this BZ.

      Thanks!

      — Additional comment from Dhruv Gautam on 2021-12-03 17:07:34 UTC —

      Hi Nick

      You got that absolutely right. The CRDs were not cleared due to which the operator resource was getting recreated.
      This is not a bug.

      Regards
      Dhruv Gautam

      — Additional comment from Nick Hale on 2021-12-14 15:57:11 UTC —

      Looks like we're seeing cases with no components as well. I'm reopening this.

      Just got out of a call with the related customer – they'll be posting a new must-gather soon.

      — Additional comment from Dhruv Gautam on 2021-12-14 17:09:57 UTC —

      Hi Nick

      Thanks for assisting over the remote.
      Please find latest must-gather below:
      [-]
      https://drive.google.com/file/d/1mNvJFmoabUTCXiZ0TMmucR8YAIEsWAk5/view?usp=sharing
      Regards
      Dhruv Gautam
      Red Hat

      — Additional comment from Dhruv Gautam on 2022-01-25 08:39:24 UTC —

      Hello Nick

      Any update on the bugzilla ?

      Regards
      Dhruv Gautam
      Red Hat

      — Additional comment from Dhruv Gautam on 2022-02-08 17:41:59 UTC —

      Hello Team

      Is there any update on this bugzilla ?

      Regards
      Dhruv Gautam
      Red Hat

      — Additional comment from Nick Hale on 2022-02-10 15:25:38 UTC —

      Hi Dhruv,

      Very sorry for the delayed response.

      We have an upstream PR from an external contributor addressing this. I'm moving to get either get that in or create a patch myself.

      Hopefully, there should be something merged upstream within the week – after that, I'll focus on getting it merged downstream, although I'm not sure if we can backport all the way to 4.7.z at this point. I'll follow up with the team and see what's possible.

      I'll post my findings here later today.

      — Additional comment from Nick Hale on 2022-02-10 15:26:05 UTC —

      Current upstream PR:
      https://github.com/operator-framework/operator-lifecycle-manager/pull/2582
      — Additional comment from Nick Hale on 2022-02-10 16:16:50 UTC —

      Okay, so it looks like we don't backport fixes for medium issues to 4.7.z. Have the customers upgraded to a newer version of OpenShift?

      We can look into changing the severity if this issue is causing significant interruptions for users.

      Here are the support phase dates for current OpenShift release:
      https://access.redhat.com/support/policy/updates/openshift#dates
      (that doc also has the phase SLAs as well)

      — Additional comment from Dhruv Gautam on 2022-03-09 17:14:19 UTC —

      Hi Nick

      I understand your point about supportable phase and SLAs.

      I would like to know:

      • In which RHOCP version the fix will be made available ?
      • Are there any tentative dates when the fix will be released ?

      Regards
      Dhruv Gautam
      Red Hat

      — Additional comment from Nick Hale on 2022-04-05 19:05:52 UTC —

      Dhruv,

      An upstream fix for this – different than the one mentioned in my previous comment – has merged (see
      https://github.com/operator-framework/operator-lifecycle-manager/pull/2697
      ).> - In which RHOCP version the fix will be made available ? If we can get the change synced to our downstream repository on time, it will be released in version 4.11.0.> - Are there any tentative dates when the fix will be released ?As of today, 4.11.0 is planned to go GA on July 13th 2022 (see
      https://docs.google.com/spreadsheets/d/19bRYespPb-AvclkwkoizmJ6NZ54p9iFRn6DGD8Ugv2c/edit#gid=0
      ).

      — Additional comment from Per da Silva on 2022-04-07 09:04:37 UTC —

      Hi Dhruv,

      We've brought this change downstream on this PR:
      https://github.com/openshift/operator-framework-olm/pull/278
      I'll update this ticket to ON_QA

      Cheers,

      Per

      — Additional comment from Jian Zhang on 2022-04-07 09:59:45 UTC —

      Hi Nick,> Current upstream PR: https://github.com/operator-framework/operator-lifecycle-manager/pull/2582This fix PR had been rejected, could you help link the right one? Thanks!

      Hi Bruno,> We've brought this change downstream on this PR: https://github.com/openshift/operator-framework-olm/pull/278I checked the latest payload, it contains the fixed PR. Could you help verify it? Thanks!
      mac:~ jianzhang$ oc adm release info registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-04-07-053433 -a .dockerconfigjson --commits|grep olm
      operator-lifecycle-manager
      https://github.com/openshift/operator-framework-olm
      491ea010345b42d0ffd19208124e16bc8a9d1355

      — Additional comment from Bruno Andrade on 2022-04-08 00:09:44 UTC —

      oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.11.0-0.nightly-2022-04-07-053433 True False 4h25m Cluster version is 4.11.0-0.nightly-2022-04-07-053433

      oc exec olm-operator-67fc464567-8wl9l -n openshift-operator-lifecycle-manager – olm --version
      OLM version: 0.19.0
      git commit: 491ea010345b42d0ffd19208124e16bc8a9d1355

      cat og-single.yaml
      kind: OperatorGroup
      apiVersion: operators.coreos.com/v1
      metadata:
      name: og-single1
      namespace: default
      spec:
      targetNamespaces:

      • default

      oc apply -f og-single.yaml
      operatorgroup.operators.coreos.com/og-single1 created

      cat teiidcatsrc.yaml
      apiVersion: operators.coreos.com/v1alpha1
      kind: CatalogSource
      metadata:
      name: teiid
      namespace: default
      spec:
      displayName: "teiid Operators"
      image: quay.io/bandrade/teiid-index:1898500
      publisher: QE
      sourceType: grpc

      oc create -f teiidcatsrc.yaml
      catalogsource.operators.coreos.com/teiid created

      cat teiidsub.yaml
      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
      name: teiid
      namespace: default
      spec:
      source: teiid
      sourceNamespace: default

      channel: alpha
      installPlanApproval: Automatic
      name: teiid

      oc apply -f teiidsub.yaml
      subscription.operators.coreos.com/teiid created

      oc get sub -n default
      NAME PACKAGE SOURCE CHANNEL
      teiid teiid teiid alpha

      oc get ip -n default
      NAME CSV APPROVAL APPROVED
      install-psjsf teiid.v0.3.0 Automatic true

      oc get csv
      NAME DISPLAY VERSION REPLACES PHASE
      teiid.v0.3.0 Teiid 0.3.0 Succeeded

      oc get operators -n default
      NAME AGE
      cluster-logging.openshift-logging 5h14m
      elasticsearch-operator.openshift-operators-redhat 5h14m
      teiid.default 13m

      oc delete sub teiid
      subscription.operators.coreos.com "teiid" deleted

      oc delete csv teiid.v0.3.0
      clusterserviceversion.operators.coreos.com "teiid.v0.3.0" deleted

      oc get operator teiid.default -o yaml
      apiVersion: operators.coreos.com/v1
      kind: Operator
      metadata:
      creationTimestamp: "2022-04-07T23:22:22Z"
      generation: 1
      name: teiid.default
      resourceVersion: "146694"
      uid: d74c796d-7482-4caa-96ed-fbd401a35f19
      spec: {}
      status:
      components:
      labelSelector:
      matchExpressions:

      • key: operators.coreos.com/teiid.default
        operator: Exists

      oc delete operator teiid.default -n default 1 ↵
      warning: deleting cluster-scoped resources, not scoped to the provided namespace
      operator.operators.coreos.com "teiid.default" deleted

      oc get operator teiid.default -o yaml
      Error from server (NotFound): operators.operators.coreos.com "teiid.default" not found

      LGTM, marking as VERIFIED

      — Additional comment from Raúl Fernández on 2022-05-27 10:24:05 UTC —

      Hi,

      My customer Telefonica is experiencing this problem (case linked) and requesting this backport to 4.8, as they are having this issue with multiple operators.

      Could we have a backport schedule for this?

      Thanks.

      Best regards,
      Raúl Fernández

      — Additional comment from errata-xmlrpc on 2022-06-15 18:25:09 UTC —

      This bug has been added to advisory RHEA-2022:5069 by OpenShift Release Team Bot (ocp-build/
      buildvm.openshift.eng.bos.redhat.com@REDHAT.COM
      )

      — Additional comment from Silvia Parpatekar on 2022-06-24 08:07:53 UTC —

      Hello Team,

      We have a customer with OCP 4.9.36 Baremetal and is facing this issue with Performance Addon Operator.
      Easily reproducible: Installed the operator and Deleted it's resources:
      ~~~
      $ oc delete csv performance-addon-operator.v4.8.8
      $ oc delete ip install-flsns
      $ oc delete sub performance-addon-operator
      $ oc delete job.batch/77c159445752f625e1337c92dabd6cfdfc1eebcb7accbbfd5d1c7227656cec6 -n openshift-marketplace
      $ oc delete configmap/77c159445752f625e1337c92dabd6cfdfc1eebcb7accbbfd5d1c7227656cec6 -n openshift-marketplace
      $ oc get crd | grep performance
      $ oc delete crd performanceprofiles.performance.openshift.io
      $ oc scale deployment -n openshift-operator-lifecycle-manager olm-operator --replicas=0
      $ oc delete operator performance-addon-operator.openshift-operators
      $ oc scale deployment -n openshift-operator-lifecycle-manager olm-operator --replicas=1
      $ oc get operator
      NAME AGE
      performance-addon-operator.openshift-operators 7m31s
      ~~~

      The Operator still exists in CLI but when I checked the web console I couldn't find any operator there. We tried various other ways to delete but no luck.

      Can we get an update in which version is this going to be fixed?

      — Additional comment from Immanuvel on 2022-07-01 12:44:27 UTC —

      Hello Team,

      Any workaround available for this ?

      Thanks
      Immanuvel

      — Additional comment from himadri on 2022-07-01 13:16:41 UTC —

      Hello Team,

      Accounts team has reached out to EMT on this as the Customer is frustrated with the time being consumed without a fix. The Customer temperature is high and would request your intervention in getting a permanent fix or workaround at the earliest, to avoid the situation blowing up further.

      Please find the Business impact as shared by the Accounts Team for your reference.

      Business Impact - This issue has made the development functionality miss 2 sprint cycles of delivery for the customer. Customer cannot afford to miss delivery of yet another cycle and delay in delivery.

      Thanks,

      Himadri.

      — Additional comment from Red Hat Bugzilla on 2022-07-09 04:22:15 UTC —

      remove performed by PnT Account Manager <
      pnt-expunge@redhat.com
      >

      — Additional comment from Red Hat Bugzilla on 2022-07-09 04:22:46 UTC —

      remove performed by PnT Account Manager <
      pnt-expunge@redhat.com
      >

      — Additional comment from Raúl Fernández on 2022-07-11 07:41:36 UTC —

      Hi,

      Now that this BZ is in verified state, I think a back port to 4.10 is needed as this issue is affecting many customers. Is there any plan for it?

              pegoncal@redhat.com Per Goncalves da Silva
              pegoncal@redhat.com Per Goncalves da Silva
              Jian Zhang Jian Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: