Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37238

[Upgrades][4.16->4.17] Upgrades are broken because of UDN error: "Failed looking for primary network annotation:"

XMLWordPrintable

    • None
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem: 4.16 to 4.17 upgrades are broken

      Version-Release number of selected component (if applicable): 4.17

      How reproducible: ALWAYS

      Steps to Reproduce:

      1. https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_ovn-kubernetes/2229/pull-ci-openshift-ovn-kubernetes-master-4.17-upgrade-from-stable-4.16-e2e-aws-ovn-upgrade/1813682175963303936 

      2.https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_ovn-kubernetes/2228/pull-ci-openshift-ovn-kubernetes-master-4.17-upgrade-from-stable-4.16-e2e-aws-ovn-upgrade/1813615743032365056 

      3.

      Actual results:

      Expected results:

      Additional info:

      Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

      Affected Platforms:

      Is it an

      1. internal CI failure - YES
      2. customer issue / SD
      3. internal RedHat testing failure

      If it is a CI failure:

      • Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
        • Same lanes; links attacked above
      • Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
        • N/A no more SDN in 4.17
      • Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
        • susceptible to all platforms
      • When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
        • Probably from the last two downstream merges we did

      CNI side is expecting role:primary on all pods but older pods won't have this, we forgot to account for backwards compatibility

      FIX:

      1. All CNI bits must be behind a feature gate
      2. We must handle role:primary being absent for pods and ignore them in the checks except for when network is default

       

            ellorent Felix Enrique Llorente Pastora
            sseethar Surya Seetharaman
            Anurag Saxena Anurag Saxena
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: