Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37473

Network and dns operators not getting upgraded


    • Important
    • None
    • False
    • Hide



      Description of problem:

      CVO stuck in upgrading state for Network clusteroperator.
      There are multiple partial upgrades observed:
          - completionTime: null
            image: fr2.icr.io/armada-master/ocp-release:4.13.43-x86_64
            startedTime: "2024-07-04T06:21:36Z"
            state: Partial
            verified: false
            version: 4.13.43
          - completionTime: "2024-07-04T06:21:36Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.58-x86_64
            startedTime: "2024-06-26T16:25:53Z"
            state: Partial
            verified: false
            version: 4.12.58
          - completionTime: "2024-06-26T16:25:53Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.56-x86_64
            startedTime: "2024-06-05T17:06:12Z"
            state: Partial
            verified: false
            version: 4.12.56
          - completionTime: "2024-06-05T17:06:12Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.55-x86_64
            startedTime: "2024-05-01T19:41:10Z"
            state: Partial
            verified: false
            version: 4.12.55
          - completionTime: "2024-05-01T19:41:10Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.51-x86_64
            startedTime: "2024-04-03T17:32:01Z"
            state: Partial
            verified: false
            version: 4.12.51
          - completionTime: "2024-04-03T17:32:01Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.49-x86_64
            startedTime: "2024-03-06T19:11:42Z"
            state: Partial
            verified: false
            version: 4.12.49
          - completionTime: "2024-03-06T19:11:42Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.46-x86_64
            startedTime: "2024-02-07T19:11:28Z"
            state: Partial
            verified: false
            version: 4.12.46
          - completionTime: "2024-02-07T19:11:28Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.44-x86_64
            startedTime: "2023-12-13T19:06:13Z"
            state: Partial
            verified: false
            version: 4.12.44
          - completionTime: "2023-11-16T14:33:09Z"
            image: fr2.icr.io/armada-master/ocp-release:4.12.40-x86_64
            startedTime: "2023-11-16T13:57:39Z"
            state: Completed
            verified: false
            version: 4.12.40
      All the operators are reporting Available (even network and dns)

      Version-Release number of selected component (if applicable):

      ROKS 4.13.43

      How reproducible:

      Tried reproducing it on RHOCP but couldn't reproduce. 
      Triggered the upgrade from RHOCP 4.12.40 to 4.12.44 and before CVO tries upgrading Network operator, a new version upgrade was triggered.
      Above step was repeated till there were 7 partial upgrades. After triggering final upgrade to 4.13.z version, the upgrade completed successfully. Network, dns and machine-config operator got upgraded as well.

      Steps to Reproduce:

          1. NA

      Actual results:

      Upgrade is not proceeding further from network operator. The image used by network-operator is from 4.13.43 version itself which means the operator is using new image but CVO is not shoring the updated image in "oc get co"
      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      console                                    4.13.43   True        False         False      55d     
      csi-snapshot-controller                    4.13.43   True        False         False      238d    
      dns                                        4.12.40   True        False         False      238d    
      image-registry                             4.13.43   True        False         False      133d    
      ingress                                    4.13.43   True        False         False      7d3h    
      insights                                   4.13.43   True        False         False      98d     
      kube-apiserver                             4.13.43   True        False         False      238d    
      kube-controller-manager                    4.13.43   True        False         False      238d    
      kube-scheduler                             4.13.43   True        False         False      238d    
      kube-storage-version-migrator              4.13.43   True        False         False      7d5h    
      marketplace                                4.13.43   True        False         False      238d    
      monitoring                                 4.13.43   True        False         False      158d    
      network                                    4.12.40   True        False         False      238d    
      node-tuning                                4.13.43   True        False         False      7d4h    
      openshift-apiserver                        4.13.43   True        False         False      238d    
      openshift-controller-manager               4.13.43   True        False         False      238d    
      openshift-samples                          4.13.43   True        False         False      7d7h    
      operator-lifecycle-manager                 4.13.43   True        False         False      238d    
      operator-lifecycle-manager-catalog         4.13.43   True        False         False      238d    
      operator-lifecycle-manager-packageserver   4.13.43   True        False         False      7d4h    
      service-ca                                 4.13.43   True        False         False      238d    
      storage                                    4.13.43   True        False         False      238d    

      Expected results:

      The upgrade should get completed fine.    

      Additional info:

      Captured go routine stacks using below commands:
      In Terminal 1, run below command:
      $ oc logs <network-operator-pod> -f     /// Leaving this running as it it, do not exit
      In terminal 1, run below set of commands:
      $ oc exec -it <network-operator-pod> -- bash
      $ kill -s QUIT 1     // This will kill the container and you will exit automatically.

              pdiak@redhat.com Patryk Diak
              rhn-support-dgautam Dhruv Gautam
              Anurag Saxena Anurag Saxena
              2 Vote for this issue
              6 Start watching this issue
