Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5797

TALM backup CGU only indicates status of one cluster when two clusters are being backed up

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Undefined
    • None
    • 4.12
    • TALM Operator
    • None
    • Moderate
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      TALM backup is only reporting the status of one cluster when two are being backed up.  It is not reported a completed backup.  backup-agent pod log for the unreported cluster confirms that backup did in fact complete.  Repeated tests show that no particular cluster is preferred - one test will show status for one cluster, a subsequent test will show status for the other.

      Version-Release number of selected component (if applicable):

      4.12

      How reproducible:

      Test was run several times in succession.

      Steps to Reproduce:

      Environment consists of a hub cluster with two spokes.
      
      Tests were executed via automation:
      https://gitlab.cee.redhat.com/cnf/cnf-gotests/-/blob/master/test/ran/talm/tests/talm_backup.go
      Test case "It("should not affect backup on second spoke in same batch" fails, within test suite and as isolated test.
      
      Test case causes one spoke backup to fail, but the other should (and does) back up successfully.  Test doe not pass because cgu is only reporting results for one cluster.
      

      Actual results:

      Here is the cgu output.  Note that only one cluster status is shown:
      
      [kni@registry.kni-qe-18 ~]$ oc get cgu -n talm-test generated-cgu-disk-full-multiple-spokes -o yaml 
      apiVersion: ran.openshift.io/v1alpha1
      kind: ClusterGroupUpgrade
      metadata:
        creationTimestamp: "2023-01-12T17:30:41Z"
        finalizers:
        - ran.openshift.io/cleanup-finalizer
        generation: 1
        name: generated-cgu-disk-full-multiple-spokes
        namespace: talm-test
        resourceVersion: "107942509"
        uid: a9c3ef94-2edc-45bf-92c3-f87a0ac913ad
      spec:
        actions:
          afterCompletion:
            deleteObjects: true
          beforeEnable: {}
        backup: true
        clusters:
        - ocp-edge87
        - ocp-edge88
        enable: true
        managedPolicies:
        - generated-policy-disk-full-multiple-spokes
        preCaching: false
        remediationStrategy:
          maxConcurrency: 100
          timeout: 250
      status:
        backup:
          clusters:
          - ocp-edge87
          - ocp-edge88
          startedAt: "2023-01-12T17:30:41Z"
          status:
            ocp-edge88: Starting
        computedMaxConcurrency: 2
        conditions:
        - lastTransitionTime: "2023-01-12T17:30:41Z"
          message: All selected clusters are valid
          reason: ClusterSelectionCompleted
          status: "True"
          type: ClustersSelected
        - lastTransitionTime: "2023-01-12T17:30:41Z"
          message: Completed validation
          reason: ValidationCompleted
          status: "True"
          type: Validated
        - lastTransitionTime: "2023-01-12T17:30:41Z"
          message: Backup in progress for 1 clusters
          reason: InProgress
          status: "False"
          type: BackupSuceeded
        - lastTransitionTime: "2023-01-12T17:30:41Z"
          message: Cluster backup is in progress
          reason: NotStarted
          status: "False"
          type: Progressing
        copiedPolicies:
        - generated-cgu-disk-full-multiple-spokes-generat-zz6lg
        managedPoliciesForUpgrade:
        - name: generated-policy-disk-full-multiple-spokes
          namespace: talm-test
        managedPoliciesNs:
          generated-policy-disk-full-multiple-spokes: talm-test
        placementBindings:
        - generated-cgu-disk-full-multiple-spokes-generated-policy-disk-full-multiple-spokes-placement-kstpc
        placementRules:
        - generated-cgu-disk-full-multiple-spokes-generated-policy-disk-full-multiple-spokes-placement-kstpc
        remediationPlan:
        - - ocp-edge87
          - ocp-edge88
        safeResourceNames:
          generated-cgu-disk-full-multiple-spokes-generated-policy-disk-full-multiple-spokes: generated-cgu-disk-full-multiple-spokes-generat-zz6lg
          generated-cgu-disk-full-multiple-spokes-generated-policy-disk-full-multiple-spokes-config: generated-cgu-disk-full-multiple-spokes-generated-policy--tmfd8
          generated-cgu-disk-full-multiple-spokes-generated-policy-disk-full-multiple-spokes-placement: generated-cgu-disk-full-multiple-spokes-generated-policy-disk-full-multiple-spokes-placement-kstpc
        status:
          startedAt: "2023-01-12T17:30:41Z"
      
      

      Expected results:

      cgu should show status for both clusters:
      
      status:
        backup:
          clusters:
          - ocp-edge87
          - ocp-edge88
          startedAt: "2023-01-12T17:30:41Z"
          status:
            ocp-edge87: Starting
            ocp-edge88: Starting
      
      

      Additional info:

      Attaching TALM pod logs from hub cluster, taken before and after test execution

      Attachments

        Issue Links

          Activity

            People

              jche@redhat.com Jun Chen
              bblock@redhat.com Bonnie Block
              Bonnie Block Bonnie Block
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: