Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15282

Network Operator not setting its version and blocking upgrade completion

XMLWordPrintable

    • Critical
    • No
    • SDN Sprint 238
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required
    • Customer Escalated

      Description of problem:

      When upgrading a 4.11.33 cluster to 4.12.21, the Cluster Version Operator is stuck waiting for the Network Operator to update:
      
      $ omc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.11.43   True        True          14m     Working towards 4.12.21: 672 of 831 done (80% complete), waiting on network
      
      CVO pod log states:
      
      2023-06-16T12:07:22.596127142Z I0616 12:07:22.596023       1 metrics.go:490] ClusterOperator network is not setting the 'operator' version
      
      Indeed the NO version is empty:
      
      $ omc get co network -o json|jq '.status.versions'
      null
      
      However, it's status is available and not progressing, not degraded: 
      
      $ omc get co network
      NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
      network             True        False         False      19m
         
      Network operator pod log states:
      
      2023-06-16T12:08:56.542287546Z I0616 12:08:56.542271       1 connectivity_check_controller.go:138] ConnectivityCheckController is waiting for transition to desired version (4.12.21) to be completed.
      2023-06-16T12:04:40.584407589Z I0616 12:04:40.584349       1 ovn_kubernetes.go:1437] OVN-Kubernetes master and node already at release version 4.12.21; no changes required
      
      The Network Operator pod, however, has the version correctly:
      $ omc get pods -n openshift-network-operator -o jsonpath='{.items[].spec.containers[0].env[?(@.name=="RELEASE_VERSION")]}'|jq
      {
        "name": "RELEASE_VERSION",
        "value": "4.12.21"
      }
      
      Restarts of the related pods had no effect.  I have trace logs of the Network Operator available.  It looked like it might be related to https://github.com/openshift/cluster-network-operator/pull/1818 but that looks to be code introduced in 4.14.

       

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      I have not reproduced.

      Steps to Reproduce:

      1.  Cluster version began at stable 4.10.56
      2.  Upgraded to 4.11.43 successfully
      3.  Upgraded to 4.12.21 and is stuck. 

      Actual results:

      CVO Stuck waiting on NO to complete, NO 

      Expected results:

      NO to update its version so the CVO can continue.

      Additional info:

      Bare Metal IPI cluster with OVN Networking.

            jcaamano@redhat.com Jaime Caamaño Ruiz
            rhn-support-cshepher Christine Shepherd
            Weibin Liang Weibin Liang
            Votes:
            1 Vote for this issue
            Watchers:
            21 Start watching this issue

              Created:
              Updated:
              Resolved: