Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37473

Network and dns operators not getting upgraded

XMLWordPrintable

    • Important
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      CVO stuck in upgrading state for Network clusteroperator.
      There are multiple partial upgrades observed:
      =================================
          history:
          - completionTime: null
            image: fr2.icr.io/armada-master/ocp-release:4.13.43-x86_64
            startedTime: "2024-07-04T06:21:36Z"
            state: Partial
            verified: false
            version: 4.13.43
          - completionTime: "2024-07-04T06:21:36Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.58-x86_64
            startedTime: "2024-06-26T16:25:53Z"
            state: Partial
            verified: false
            version: 4.12.58
          - completionTime: "2024-06-26T16:25:53Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.56-x86_64
            startedTime: "2024-06-05T17:06:12Z"
            state: Partial
            verified: false
            version: 4.12.56
          - completionTime: "2024-06-05T17:06:12Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.55-x86_64
            startedTime: "2024-05-01T19:41:10Z"
            state: Partial
            verified: false
            version: 4.12.55
          - completionTime: "2024-05-01T19:41:10Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.51-x86_64
            startedTime: "2024-04-03T17:32:01Z"
            state: Partial
            verified: false
            version: 4.12.51
          - completionTime: "2024-04-03T17:32:01Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.49-x86_64
            startedTime: "2024-03-06T19:11:42Z"
            state: Partial
            verified: false
            version: 4.12.49
          - completionTime: "2024-03-06T19:11:42Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.46-x86_64
            startedTime: "2024-02-07T19:11:28Z"
            state: Partial
            verified: false
            version: 4.12.46
          - completionTime: "2024-02-07T19:11:28Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.44-x86_64
            startedTime: "2023-12-13T19:06:13Z"
            state: Partial
            verified: false
            version: 4.12.44
          - completionTime: "2023-11-16T14:33:09Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.40-x86_64
            startedTime: "2023-11-16T13:57:39Z"
            state: Completed
            verified: false
            version: 4.12.40
      =================================
      
      All the operators are reporting Available (even network and dns)

      Version-Release number of selected component (if applicable):

      ROKS 4.13.43

      How reproducible:

      Tried reproducing it on RHOCP but couldn't reproduce. 
      Triggered the upgrade from RHOCP 4.12.40 to 4.12.44 and before CVO tries upgrading Network operator, a new version upgrade was triggered.
      Above step was repeated till there were 7 partial upgrades. After triggering final upgrade to 4.13.z version, the upgrade completed successfully. Network, dns and machine-config operator got upgraded as well.

      Steps to Reproduce:

          1. NA
          2.
          3.
          

      Actual results:

      Upgrade is not proceeding further from network operator. The image used by network-operator is from 4.13.43 version itself which means the operator is using new image but CVO is not shoring the updated image in "oc get co"
      
      ====================
      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      console                                    4.13.43   True        False         False      55d     
      csi-snapshot-controller                    4.13.43   True        False         False      238d    
      dns                                        4.12.40   True        False         False      238d    
      image-registry                             4.13.43   True        False         False      133d    
      ingress                                    4.13.43   True        False         False      7d3h    
      insights                                   4.13.43   True        False         False      98d     
      kube-apiserver                             4.13.43   True        False         False      238d    
      kube-controller-manager                    4.13.43   True        False         False      238d    
      kube-scheduler                             4.13.43   True        False         False      238d    
      kube-storage-version-migrator              4.13.43   True        False         False      7d5h    
      marketplace                                4.13.43   True        False         False      238d    
      monitoring                                 4.13.43   True        False         False      158d    
      network                                    4.12.40   True        False         False      238d    
      node-tuning                                4.13.43   True        False         False      7d4h    
      openshift-apiserver                        4.13.43   True        False         False      238d    
      openshift-controller-manager               4.13.43   True        False         False      238d    
      openshift-samples                          4.13.43   True        False         False      7d7h    
      operator-lifecycle-manager                 4.13.43   True        False         False      238d    
      operator-lifecycle-manager-catalog         4.13.43   True        False         False      238d    
      operator-lifecycle-manager-packageserver   4.13.43   True        False         False      7d4h    
      service-ca                                 4.13.43   True        False         False      238d    
      storage                                    4.13.43   True        False         False      238d    
      
      

      Expected results:

      The upgrade should get completed fine.    

      Additional info:

      Captured go routine stacks using below commands:
      
      In Terminal 1, run below command:
      $ oc logs <network-operator-pod> -f     /// Leaving this running as it it, do not exit
      
      In terminal 1, run below set of commands:
      $ oc exec -it <network-operator-pod> -- bash
      $ kill -s QUIT 1     // This will kill the container and you will exit automatically.

              pdiak@redhat.com Patryk Diak
              rhn-support-dgautam Dhruv Gautam
              Anurag Saxena Anurag Saxena
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: