Loading...

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.9.z
Component/s: OLM
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:

4.9.z
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of https://bugzilla.redhat.com/show_bug.cgi?id=2106838 for backporting purposes

+++ This bug was initially created as a clone of
Bug #2015023
+++

Description of problem:
Tried to uninstall the operator and it worked. However, the operator custom resource doesn't gets deleted.
Strangely, it is getting recreated even after issuing "oc delete operator <name>"

Version-Release number of selected component (if applicable):
4.7.31

How reproducible:
100%

Steps to Reproduce:
1. Install any operator
2. Check operator resource
3. Uninstall the operator
4. Try to delete the "operator" resource.
5. List the operator resource. The operator resource will get recreated.

Actual results:
The operator resource is not getting removed even after deleting it.

Expected results:
After issuing "oc delete operator <name>", the operator resource should get removed.

Additional info:
Similar bug[1] was fix in 4.7.0 but looks like the issue is still there.

[1]
https://bugzilla.redhat.com/show_bug.cgi?id=1899588
— Additional comment from Dhruv Gautam on 2021-10-18 09:14:25 UTC —

Hi Team

Let me know if you need any logs.

Regards
Dhruv Gautam

— Additional comment from Nick Hale on 2021-12-02 20:09:45 UTC —

Sorry for the slow response!

@
dgautam@redhat.com
I'm going to need the status of the Operator resource after the deletion attempt is made. That status should show any remaining components. Without that info, I can only suspect that some cluster scoped resources still exist that reference the Operator – e.g. a CRD – since I have been unable to reproduce the issue myself.

A must-gather will help too.

— Additional comment from Dhruv Gautam on 2021-12-03 12:45:05 UTC —

Hi Nick

Must-gather is available in below google drive:
[-]
https://drive.google.com/file/d/1DtGkIzZpWYjihu_kG0OR4BkLFjIsPlFB/view?usp=sharing
If required, we can try to reproduce the issue together.

Regards
Dhruv Gautam

— Additional comment from Nick Hale on 2021-12-03 15:53:47 UTC —

Looking at `cluster-scoped-resources/operators.coreos.com/operators/cloud-native-postgresql.sandbox-kokj.yaml` in the must-gather shows that there are still resources related to the Operator on the cluster:

```

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
name: postgresql-operator-controller-manager-1-9-1-service-auth-reader
namespace: kube-system
apiVersion: apiextensions.k8s.io/v1
conditions:
lastTransitionTime: "2021-10-06T10:29:10Z"
message: no conflicts found
reason: NoConflicts
status: "True"
type: NamesAccepted
lastTransitionTime: "2021-10-06T10:29:11Z"
message: the initial names have been accepted
reason: InitialNamesAccepted
status: "True"
type: Established
kind: CustomResourceDefinition
name: clusters.postgresql.k8s.enterprisedb.io
apiVersion: apiextensions.k8s.io/v1
conditions:
lastTransitionTime: "2021-10-06T10:29:10Z"
message: no conflicts found
reason: NoConflicts
status: "True"
type: NamesAccepted
lastTransitionTime: "2021-10-06T10:29:10Z"
message: the initial names have been accepted
reason: InitialNamesAccepted
status: "True"
type: Established
kind: CustomResourceDefinition
name: backups.postgresql.k8s.enterprisedb.io
apiVersion: apiextensions.k8s.io/v1
conditions:
lastTransitionTime: "2021-10-06T10:29:11Z"
message: no conflicts found
reason: NoConflicts
status: "True"
type: NamesAccepted
lastTransitionTime: "2021-10-06T10:29:11Z"
message: the initial names have been accepted
reason: InitialNamesAccepted
status: "True"
type: Established
kind: CustomResourceDefinition
name: scheduledbackups.postgresql.k8s.enterprisedb.io
```

These must be deleted before the Operator resource can be.

I'm now fairly confident that this is not a bug. If you delete these resources then the Operator resource and it is still recreated, then please reopen this BZ.

Thanks!

— Additional comment from Dhruv Gautam on 2021-12-03 17:07:34 UTC —

Hi Nick

You got that absolutely right. The CRDs were not cleared due to which the operator resource was getting recreated.
This is not a bug.

Regards
Dhruv Gautam

— Additional comment from Nick Hale on 2021-12-14 15:57:11 UTC —

Looks like we're seeing cases with no components as well. I'm reopening this.

Just got out of a call with the related customer – they'll be posting a new must-gather soon.

— Additional comment from Dhruv Gautam on 2021-12-14 17:09:57 UTC —

Hi Nick

Thanks for assisting over the remote.
Please find latest must-gather below:
[-]
https://drive.google.com/file/d/1mNvJFmoabUTCXiZ0TMmucR8YAIEsWAk5/view?usp=sharing
Regards
Dhruv Gautam
Red Hat

— Additional comment from Dhruv Gautam on 2022-01-25 08:39:24 UTC —

Hello Nick

Any update on the bugzilla ?

Regards
Dhruv Gautam
Red Hat

— Additional comment from Dhruv Gautam on 2022-02-08 17:41:59 UTC —

Hello Team

Is there any update on this bugzilla ?

Regards
Dhruv Gautam
Red Hat

— Additional comment from Nick Hale on 2022-02-10 15:25:38 UTC —

Hi Dhruv,

Very sorry for the delayed response.

We have an upstream PR from an external contributor addressing this. I'm moving to get either get that in or create a patch myself.

Hopefully, there should be something merged upstream within the week – after that, I'll focus on getting it merged downstream, although I'm not sure if we can backport all the way to 4.7.z at this point. I'll follow up with the team and see what's possible.

I'll post my findings here later today.

— Additional comment from Nick Hale on 2022-02-10 15:26:05 UTC —

Current upstream PR:
https://github.com/operator-framework/operator-lifecycle-manager/pull/2582
— Additional comment from Nick Hale on 2022-02-10 16:16:50 UTC —

Okay, so it looks like we don't backport fixes for medium issues to 4.7.z. Have the customers upgraded to a newer version of OpenShift?

We can look into changing the severity if this issue is causing significant interruptions for users.

Here are the support phase dates for current OpenShift release:
https://access.redhat.com/support/policy/updates/openshift#dates
(that doc also has the phase SLAs as well)

— Additional comment from Dhruv Gautam on 2022-03-09 17:14:19 UTC —

Hi Nick

I understand your point about supportable phase and SLAs.

I would like to know:

In which RHOCP version the fix will be made available ?
Are there any tentative dates when the fix will be released ?

Regards
Dhruv Gautam
Red Hat

— Additional comment from Nick Hale on 2022-04-05 19:05:52 UTC —

Dhruv,

An upstream fix for this – different than the one mentioned in my previous comment – has merged (see
https://github.com/operator-framework/operator-lifecycle-manager/pull/2697
).> - In which RHOCP version the fix will be made available ? If we can get the change synced to our downstream repository on time, it will be released in version 4.11.0.> - Are there any tentative dates when the fix will be released ?As of today, 4.11.0 is planned to go GA on July 13th 2022 (see
https://docs.google.com/spreadsheets/d/19bRYespPb-AvclkwkoizmJ6NZ54p9iFRn6DGD8Ugv2c/edit#gid=0
).

— Additional comment from Per da Silva on 2022-04-07 09:04:37 UTC —

Hi Dhruv,

We've brought this change downstream on this PR:
https://github.com/openshift/operator-framework-olm/pull/278
I'll update this ticket to ON_QA

Cheers,

Per

— Additional comment from Jian Zhang on 2022-04-07 09:59:45 UTC —

Hi Nick,> Current upstream PR: https://github.com/operator-framework/operator-lifecycle-manager/pull/2582This fix PR had been rejected, could you help link the right one? Thanks!

Hi Bruno,> We've brought this change downstream on this PR: https://github.com/openshift/operator-framework-olm/pull/278I checked the latest payload, it contains the fixed PR. Could you help verify it? Thanks!
mac:~ jianzhang$ oc adm release info registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-04-07-053433 -a .dockerconfigjson --commits|grep olm
operator-lifecycle-manager
https://github.com/openshift/operator-framework-olm
491ea010345b42d0ffd19208124e16bc8a9d1355

— Additional comment from Bruno Andrade on 2022-04-08 00:09:44 UTC —

oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-04-07-053433 True False 4h25m Cluster version is 4.11.0-0.nightly-2022-04-07-053433

oc exec olm-operator-67fc464567-8wl9l -n openshift-operator-lifecycle-manager – olm --version
OLM version: 0.19.0
git commit: 491ea010345b42d0ffd19208124e16bc8a9d1355

cat og-single.yaml
kind: OperatorGroup
apiVersion: operators.coreos.com/v1
metadata:
name: og-single1
namespace: default
spec:
targetNamespaces:

default

oc apply -f og-single.yaml
operatorgroup.operators.coreos.com/og-single1 created

cat teiidcatsrc.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: teiid
namespace: default
spec:
displayName: "teiid Operators"
image: quay.io/bandrade/teiid-index:1898500
publisher: QE
sourceType: grpc

oc create -f teiidcatsrc.yaml
catalogsource.operators.coreos.com/teiid created

cat teiidsub.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: teiid
namespace: default
spec:
source: teiid
sourceNamespace: default

channel: alpha
installPlanApproval: Automatic
name: teiid

oc apply -f teiidsub.yaml
subscription.operators.coreos.com/teiid created

oc get sub -n default
NAME PACKAGE SOURCE CHANNEL
teiid teiid teiid alpha

oc get ip -n default
NAME CSV APPROVAL APPROVED
install-psjsf teiid.v0.3.0 Automatic true

oc get csv
NAME DISPLAY VERSION REPLACES PHASE
teiid.v0.3.0 Teiid 0.3.0 Succeeded

oc get operators -n default
NAME AGE
cluster-logging.openshift-logging 5h14m
elasticsearch-operator.openshift-operators-redhat 5h14m
teiid.default 13m

oc delete sub teiid
subscription.operators.coreos.com "teiid" deleted

oc delete csv teiid.v0.3.0
clusterserviceversion.operators.coreos.com "teiid.v0.3.0" deleted

oc get operator teiid.default -o yaml
apiVersion: operators.coreos.com/v1
kind: Operator
metadata:
creationTimestamp: "2022-04-07T23:22:22Z"
generation: 1
name: teiid.default
resourceVersion: "146694"
uid: d74c796d-7482-4caa-96ed-fbd401a35f19
spec: {}
status:
components:
labelSelector:
matchExpressions:

key: operators.coreos.com/teiid.default
operator: Exists

oc delete operator teiid.default -n default 1 ↵
warning: deleting cluster-scoped resources, not scoped to the provided namespace
operator.operators.coreos.com "teiid.default" deleted

oc get operator teiid.default -o yaml
Error from server (NotFound): operators.operators.coreos.com "teiid.default" not found

LGTM, marking as VERIFIED

— Additional comment from Raúl Fernández on 2022-05-27 10:24:05 UTC —

Hi,

My customer Telefonica is experiencing this problem (case linked) and requesting this backport to 4.8, as they are having this issue with multiple operators.

Could we have a backport schedule for this?

Thanks.

Best regards,
Raúl Fernández

— Additional comment from errata-xmlrpc on 2022-06-15 18:25:09 UTC —

This bug has been added to advisory RHEA-2022:5069 by OpenShift Release Team Bot (ocp-build/
buildvm.openshift.eng.bos.redhat.com@REDHAT.COM
)

— Additional comment from Silvia Parpatekar on 2022-06-24 08:07:53 UTC —

Hello Team,

We have a customer with OCP 4.9.36 Baremetal and is facing this issue with Performance Addon Operator.
Easily reproducible: Installed the operator and Deleted it's resources:
~~~
$ oc delete csv performance-addon-operator.v4.8.8
$ oc delete ip install-flsns
$ oc delete sub performance-addon-operator
$ oc delete job.batch/77c159445752f625e1337c92dabd6cfdfc1eebcb7accbbfd5d1c7227656cec6 -n openshift-marketplace
$ oc delete configmap/77c159445752f625e1337c92dabd6cfdfc1eebcb7accbbfd5d1c7227656cec6 -n openshift-marketplace
$ oc get crd | grep performance
$ oc delete crd performanceprofiles.performance.openshift.io
$ oc scale deployment -n openshift-operator-lifecycle-manager olm-operator --replicas=0
$ oc delete operator performance-addon-operator.openshift-operators
$ oc scale deployment -n openshift-operator-lifecycle-manager olm-operator --replicas=1
$ oc get operator
NAME AGE
performance-addon-operator.openshift-operators 7m31s
~~~

The Operator still exists in CLI but when I checked the web console I couldn't find any operator there. We tried various other ways to delete but no luck.

Can we get an update in which version is this going to be fixed?

— Additional comment from Immanuvel on 2022-07-01 12:44:27 UTC —

Hello Team,

Any workaround available for this ?

Thanks
Immanuvel

— Additional comment from himadri on 2022-07-01 13:16:41 UTC —

Hello Team,

Accounts team has reached out to EMT on this as the Customer is frustrated with the time being consumed without a fix. The Customer temperature is high and would request your intervention in getting a permanent fix or workaround at the earliest, to avoid the situation blowing up further.

Please find the Business impact as shared by the Accounts Team for your reference.

Business Impact - This issue has made the development functionality miss 2 sprint cycles of delivery for the customer. Customer cannot afford to miss delivery of yet another cycle and delay in delivery.

Thanks,

Himadri.

— Additional comment from Red Hat Bugzilla on 2022-07-09 04:22:15 UTC —

remove performed by PnT Account Manager <
pnt-expunge@redhat.com
>

— Additional comment from Red Hat Bugzilla on 2022-07-09 04:22:46 UTC —

remove performed by PnT Account Manager <
pnt-expunge@redhat.com
>

— Additional comment from Raúl Fernández on 2022-07-11 07:41:36 UTC —

Hi,

Now that this BZ is in verified state, I think a back port to 4.10 is needed as this issue is affecting many customers. Is there any plan for it?

is blocked by

OCPBUGS-571 Operator objects are re-created even after deleting it

Closed

links to

openshift/operator-framework-olm#372: OCPBUGS-570: [release-4.9] remove broken thread-safety (#2697)

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates