-
Bug
-
Resolution: Done
-
Normal
-
4.10
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
CNF RAN Sprint 225, CNF RAN Sprint 226
-
2
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Description of problem: Version-Release number of selected component (if applicable): Using an incorrect clusterLabelSelector operator in a CGU results in the CGU hanging until it is manually edited. It will not progress nor can it be deleted.
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
1. Configure a hub cluster with TALM 4.12, ArgoCD, and a managed cluster
2. Create a policy targeting the managed cluster which triggers a minor change i.e changing a catalog source.
3. Create a CGU which enables the policy on a target cluster using matchExpressions function. example:
- matchExpressions:
- key: label3
operator: In
values:
- value3
change the "operator: In" to the invalid syntax "operator: in".
4. Verify that the CGU status is blank.
5. Verify that there is an error in the container log similar to:
"error": "cannot obtain all the details about the clusters in the CR: cannot obtain the CGU cluster list: \"in\" is not a valid pod selector operator"}
6. Attempt to delete the CGU. The "oc delete cgu" command will hang.
7. Manually edit the CGU to fix the syntax error "in" -> "In" . The CGU will then be deleted.
Actual results:
CGU is created but status is empty. CGU cannot be deleted. oc delete command hangs.
Expected results:
CGU should not hang. CGU status should probably be updated with an error. A Reconciler error is reported in the container log but it is not passed through to the cgu.
Additional info:
[kni@registry.kni-qe-17 ~]$ cat cgu_talmtest_hang.yaml
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
name: talm412test
namespace: default
spec:
clusterLabelSelectors:
- matchExpressions:
- key: sites
operator: in #note incorrect capitalization of operator#
values:
- test-sno-invalid
enable: true
managedPolicies:
- common-config-policy
remediationStrategy:
maxConcurrency: 10
timeout: 240
[kni@registry.kni-qe-17 ~]$ oc apply -f ./cgu_talmtest_hang.yaml
clustergroupupgrade.ran.openshift.io/talm412test created
[kni@registry.kni-qe-17 ~]$ oc get cgu -A
NAMESPACE NAME UPGRADE STATE AGE
default talm412test 4s
[kni@registry.kni-qe-17 ~]$ oc get cgu talm412test -o yaml
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"ran.openshift.io/v1alpha1","kind":"ClusterGroupUpgrade","metadata":{"annotations":{},"name":"talm412test","namespace":"default"},"spec":{"clusterLabelSelectors":[{"matchExpressions":[{"key":"sites","operator":"in","values":["test-sno-invalid"]}]}],"enable":true,"managedPolicies":["common-config-policy"],"remediationStrategy":{"maxConcurrency":10,"timeout":240}}}
creationTimestamp: "2022-09-27T13:38:38Z"
finalizers:
- ran.openshift.io/cleanup-finalizer
generation: 2
name: talm412test
namespace: default
resourceVersion: "75363493"
uid: 633eefbc-6197-468b-a806-e2a34a315960
spec:
actions:
afterCompletion:
deleteObjects: true
beforeEnable: {}
backup: false
clusterLabelSelectors:
- matchExpressions:
- key: sites
operator: in
values:
- test-sno-invalid
enable: true
managedPolicies:
- common-config-policy
preCaching: false
remediationStrategy:
maxConcurrency: 10
timeout: 240
[kni@registry.kni-qe-17 ~]$ oc logs cluster-group-upgrades-controller-manager-5bdbd9856b-vlg29 -n openshift-operators -c manager|tail -7
2022-09-27T13:41:23.264Z INFO controllers.ClusterGroupUpgrade Loaded CGU {"name": "default/talm412test", "version": "75366807"}
2022-09-27T13:41:23.265Z INFO controllers.ClusterGroupUpgrade Finish reconciling CGU {"name": "default/talm412test", "requeueRightAway": false}
2022-09-27T13:41:23.265Z ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "talm412test", "namespace": "default", "error": "cannot obtain all the details about the clusters in the CR: cannot obtain the CGU cluster list: \"in\" is not a valid pod selector operator"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
- clones
-
OCPBUGS-1839 Deleting a CGU can cause a hung state
-
- Closed
-
- depends on
-
OCPBUGS-1839 Deleting a CGU can cause a hung state
-
- Closed
-
- links to