Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1848

Deleting a CGU can cause a hung state

XMLWordPrintable

    • Moderate
    • None
    • CNF RAN Sprint 225, CNF RAN Sprint 226
    • 2
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Description of problem:
      Version-Release number of selected component (if applicable):
      Using an incorrect clusterLabelSelector operator in a CGU results in the CGU hanging until it is manually edited. It will not progress nor can it be deleted.
      
      

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Always

      Steps to Reproduce:

      1. Configure a hub cluster with TALM 4.12, ArgoCD, and a managed cluster 
      2. Create a policy targeting the managed cluster which triggers a minor change i.e changing a catalog source.
      3. Create a CGU which enables the policy on a target cluster using matchExpressions function. example:
          - matchExpressions: 
              - key: label3
                operator: In
                values: 
                  - value3
      change the "operator: In" to the invalid syntax "operator: in". 
      4. Verify that the CGU status is blank.
      5. Verify that there is an error in the container log similar to:
      "error": "cannot obtain all the details about the clusters in the CR: cannot obtain the CGU cluster list: \"in\" is not a valid pod selector operator"}
      6. Attempt to delete the CGU. The "oc delete cgu" command will hang. 
      7. Manually edit the CGU to fix the syntax error "in" -> "In" . The CGU will then be deleted.

      Actual results:

      CGU is created but status is empty. CGU cannot be deleted. oc delete command hangs.

      Expected results:

      CGU should not hang. CGU status should probably be updated with an error. A Reconciler error is reported in the container log but it is not passed through to the cgu.

      Additional info:

      [kni@registry.kni-qe-17 ~]$ cat cgu_talmtest_hang.yaml 
      apiVersion: ran.openshift.io/v1alpha1
      kind: ClusterGroupUpgrade
      metadata:
        name: talm412test
        namespace: default
      spec:
        clusterLabelSelectors: 
          - matchExpressions:
              - key: sites
                operator: in       #note incorrect capitalization of operator#
                values: 
                  - test-sno-invalid
        enable: true
        managedPolicies:
        - common-config-policy
        remediationStrategy:
          maxConcurrency: 10
          timeout: 240
      
      
      [kni@registry.kni-qe-17 ~]$ oc apply -f ./cgu_talmtest_hang.yaml 
      clustergroupupgrade.ran.openshift.io/talm412test created
      
      
      [kni@registry.kni-qe-17 ~]$ oc get cgu -A
      NAMESPACE   NAME          UPGRADE STATE   AGE
      default     talm412test                   4s
      
       [kni@registry.kni-qe-17 ~]$ oc get cgu talm412test -o yaml
      apiVersion: ran.openshift.io/v1alpha1
      kind: ClusterGroupUpgrade
      metadata:
        annotations:
          kubectl.kubernetes.io/last-applied-configuration: |
            {"apiVersion":"ran.openshift.io/v1alpha1","kind":"ClusterGroupUpgrade","metadata":{"annotations":{},"name":"talm412test","namespace":"default"},"spec":{"clusterLabelSelectors":[{"matchExpressions":[{"key":"sites","operator":"in","values":["test-sno-invalid"]}]}],"enable":true,"managedPolicies":["common-config-policy"],"remediationStrategy":{"maxConcurrency":10,"timeout":240}}}
        creationTimestamp: "2022-09-27T13:38:38Z"
        finalizers:
        - ran.openshift.io/cleanup-finalizer
        generation: 2
        name: talm412test
        namespace: default
        resourceVersion: "75363493"
        uid: 633eefbc-6197-468b-a806-e2a34a315960
      spec:
        actions:
          afterCompletion:
            deleteObjects: true
          beforeEnable: {}
        backup: false
        clusterLabelSelectors:
        - matchExpressions:
          - key: sites
            operator: in
            values:
            - test-sno-invalid
        enable: true
        managedPolicies:
        - common-config-policy
        preCaching: false
        remediationStrategy:
          maxConcurrency: 10
          timeout: 240
      
      
      [kni@registry.kni-qe-17 ~]$ oc logs cluster-group-upgrades-controller-manager-5bdbd9856b-vlg29 -n openshift-operators -c manager|tail -7
      2022-09-27T13:41:23.264Z        INFO    controllers.ClusterGroupUpgrade Loaded CGU      {"name": "default/talm412test", "version": "75366807"}
      2022-09-27T13:41:23.265Z        INFO    controllers.ClusterGroupUpgrade Finish reconciling CGU  {"name": "default/talm412test", "requeueRightAway": false}
      2022-09-27T13:41:23.265Z        ERROR   controller-runtime.manager.controller.clustergroupupgrade       Reconciler error        {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "talm412test", "namespace": "default", "error": "cannot obtain all the details about the clusters in the CR: cannot obtain the CGU cluster list: \"in\" is not a valid pod selector operator"}
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
              /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
              /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214

       

              sskeard@redhat.com Steven Skeard
              josclark@redhat.com Joshua Clark
              Yang Liu Yang Liu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: