-
Bug
-
Resolution: Done
-
Normal
-
4.10
-
Moderate
-
None
-
CNF RAN Sprint 225, CNF RAN Sprint 226
-
2
-
False
-
-
Description of problem:
Description of problem: Version-Release number of selected component (if applicable): Using an incorrect clusterLabelSelector operator in a CGU results in the CGU hanging until it is manually edited. It will not progress nor can it be deleted.
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
1. Configure a hub cluster with TALM 4.12, ArgoCD, and a managed cluster 2. Create a policy targeting the managed cluster which triggers a minor change i.e changing a catalog source. 3. Create a CGU which enables the policy on a target cluster using matchExpressions function. example: - matchExpressions: - key: label3 operator: In values: - value3 change the "operator: In" to the invalid syntax "operator: in". 4. Verify that the CGU status is blank. 5. Verify that there is an error in the container log similar to: "error": "cannot obtain all the details about the clusters in the CR: cannot obtain the CGU cluster list: \"in\" is not a valid pod selector operator"} 6. Attempt to delete the CGU. The "oc delete cgu" command will hang. 7. Manually edit the CGU to fix the syntax error "in" -> "In" . The CGU will then be deleted.
Actual results:
CGU is created but status is empty. CGU cannot be deleted. oc delete command hangs.
Expected results:
CGU should not hang. CGU status should probably be updated with an error. A Reconciler error is reported in the container log but it is not passed through to the cgu.
Additional info:
[kni@registry.kni-qe-17 ~]$ cat cgu_talmtest_hang.yaml apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: talm412test namespace: default spec: clusterLabelSelectors: - matchExpressions: - key: sites operator: in #note incorrect capitalization of operator# values: - test-sno-invalid enable: true managedPolicies: - common-config-policy remediationStrategy: maxConcurrency: 10 timeout: 240 [kni@registry.kni-qe-17 ~]$ oc apply -f ./cgu_talmtest_hang.yaml clustergroupupgrade.ran.openshift.io/talm412test created [kni@registry.kni-qe-17 ~]$ oc get cgu -A NAMESPACE NAME UPGRADE STATE AGE default talm412test 4s [kni@registry.kni-qe-17 ~]$ oc get cgu talm412test -o yaml apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"ran.openshift.io/v1alpha1","kind":"ClusterGroupUpgrade","metadata":{"annotations":{},"name":"talm412test","namespace":"default"},"spec":{"clusterLabelSelectors":[{"matchExpressions":[{"key":"sites","operator":"in","values":["test-sno-invalid"]}]}],"enable":true,"managedPolicies":["common-config-policy"],"remediationStrategy":{"maxConcurrency":10,"timeout":240}}} creationTimestamp: "2022-09-27T13:38:38Z" finalizers: - ran.openshift.io/cleanup-finalizer generation: 2 name: talm412test namespace: default resourceVersion: "75363493" uid: 633eefbc-6197-468b-a806-e2a34a315960 spec: actions: afterCompletion: deleteObjects: true beforeEnable: {} backup: false clusterLabelSelectors: - matchExpressions: - key: sites operator: in values: - test-sno-invalid enable: true managedPolicies: - common-config-policy preCaching: false remediationStrategy: maxConcurrency: 10 timeout: 240 [kni@registry.kni-qe-17 ~]$ oc logs cluster-group-upgrades-controller-manager-5bdbd9856b-vlg29 -n openshift-operators -c manager|tail -7 2022-09-27T13:41:23.264Z INFO controllers.ClusterGroupUpgrade Loaded CGU {"name": "default/talm412test", "version": "75366807"} 2022-09-27T13:41:23.265Z INFO controllers.ClusterGroupUpgrade Finish reconciling CGU {"name": "default/talm412test", "requeueRightAway": false} 2022-09-27T13:41:23.265Z ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "talm412test", "namespace": "default", "error": "cannot obtain all the details about the clusters in the CR: cannot obtain the CGU cluster list: \"in\" is not a valid pod selector operator"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
- clones
-
OCPBUGS-1839 Deleting a CGU can cause a hung state
- Closed
- depends on
-
OCPBUGS-1839 Deleting a CGU can cause a hung state
- Closed
- links to