Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57211

The upgrade from 4.17.12 to 4.18.4 could not complete and the ovn-kube daemonset is crashing on some nodes with logical router policy creation error

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The upgrade from 4.17.12 to 4.18.4 could not complete and the ovn-kube daemonset was crashing on all nodes with given error:

      ""failed to initialize networks cluster logical router egress policies for the default network: failed to create no reroute policies for pods on network default: unable to create IPv4 no-reroute pod policies, err: error creating logical router policy {UUID:743d5c86-8b55-4a36-924e-7c02d42a4df9 Action:allow BFDSessions:[] ExternalIDs:map[ip-family:ip4 k8s.ovn.org/id:default-network-controller:EgressIP:102:EIP-No-Reroute-Pod-To-Pod:ip4:default k8s.ovn.org/name:EIP-No-Reroute-Pod-To-Pod k8s.ovn.org/owner-controller:default-network-controller k8s.ovn.org/owner-type:EgressIP network:default priority:102] Match:ip4.src == 192.168.32.0/19 && ip4.dst == 192.168.32.0/19 Nexthop:<nil> Nexthops:[]Options:map[] Priority:102} on router ovn_cluster_router: unexpectedly found multiple results for provided predicate"" 
      
      
      //network config
      spec:
        clusterNetwork:
        - cidr: 192.168.32.0/19
          hostPrefix: 23
        - cidr: 192.168.64.0/18
          hostPrefix: 23
        - cidr: 192.168.128.0/17
          hostPrefix: 23
        externalIP:
          policy: {}
        networkDiagnostics:
          mode: ""
          sourcePlacement: {}
          targetPlacement: {}
        networkType: OVNKubernetes
        serviceNetwork:
        - 192.168.0.0/19

      Note:  The cluster is installing successfully for 4.17.12 i.e the install config works on 4.17.12 and with a clean install of 4.18.4 too. Also upgrade from 4.18.4 to 4.18.5 also works. But upgrade from 4.17.12 to 4.18.4 could not complete with the mentioned issue.

      After analysing the output from the command "ovn-nbctl list Logical_Router_Policy", told Cu that he have multiple duplicate logical router policies with the same purpose (EIP-No-Reroute-Pod-To-Pod) but for different subnets i.e he have three policies all named EIP-No-Reroute-Pod-To-Pod with similar external IDs but different CIDRs in their match clauses. And suggested to try with the cluster having single subnet in cluster network i.e single clusterNetwork CIDR which worked.

      Now Cu wants to know why he is unable to go forward with the above network configuration as its also documented.

       

      Version-Release number of selected component (if applicable):

      How reproducible:

      Steps to Reproduce:

      1.

      2.

      3.

      Actual results:

      Expected results:

              mkennell@redhat.com Martin Kennelly
              rhn-support-shupadhy Shivam Upadhyay
              None
              None
              Jean Chen Jean Chen
              None
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: