Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54690

OCP4.16.17: [OVN] - EgressIP allocation rules are not consistently applied

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • CORENET Sprint 275
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      • Over 600 Egress IPs on a cluster
      • 6 Egress-assignable node hosts
      • 130 nodes
      • Observe in a given project that one egressIP is assigned properly to the namespace, by two matchLabel rules for the namespace.
      • Observe that in that project, 6 pods are correctly bound to this egressIP (have matching/expected SNAT entries for this egressIP address)
      • Observe that 2 pods are errantly bound to different egress IP objects (have matching SNAT entries for those two separate egress IPs) - but those egress IP objects do NOT match this target project or have podSelector rules that would otherwise acquire these pods
      • Observe also that at least one of the pods that has NO egress SNAT rule has duplicate `logical_ip` entries shared across 3 nodes. (There is a SNAT to host IP nat table entry for this pod on 3 separate nodes --> not tied to egressIP) See data highlights below.
      • Observe also that multiple pods don't have nat entries at all even for their localhost machine which may indicate they were deleted prior to the nat dump being taken.

      Version-Release number of selected component (if applicable):

      • Observed on OpenShift 4.16.17

      How reproducible:

      • One time, have not built internal replicator

      Steps to Reproduce:

      • unclear

      Actual results:

      • Duplicated logical_ip entries could be causing egressIP nat handling checks to fail since there are more than 1 nat entries...
      • egressIP allocation rules could be failing to apply correctly leading to misallocated flow state?

      Expected results:

      • EgressIP allocation logic should be consistent - we should not encounter an issue where pods in namespace A can be allocated to egressIP B when no matchlabel rules apply.
      • EgressIP binding for pods in a given namespace should impact ALL pods in that namespace unless otherwise selectively omitted via selection rules
      • NAT entries for a given pod should only show up once on a host for SNAT to node IP routing, and once again on the egressIP host for snat to egress routing, not 3x across 3 nodes for snat to host IP.

      Additional info:

      Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

      Affected Platforms:

      • Red Hat OpenShift Container Platform: Customer issue.
      • Case number: 

        04072915

         

        //Attachments in supportshell:

      • 0020-inspect.local.3368678593394018855.zip --> project inspect with impacted pods
      • 0010-egress_problem.zip --> must-gather
      • 0060-Archive.zip --> NAT table dump and egressIP cross-check output from kcs: https://access.redhat.com/solutions/7110252

      //DATA REQUESTED/PENDING:

      • sosreports from the egressIP assignable nodes
      • network-log must-gather
      • namespace inspect from openshift-ovn-kubernetes and openshift-multus

      //workaround suggested:

      • ovnkube DB rebuild on all egress-assignable hosts to rebuild the egress SNAT entry lists and refresh bind rules for all applicable pods (after data-gather)

              mkennell@redhat.com Martin Kennelly
              rhn-support-wrussell Will Russell
              None
              Ketan Lakhwara
              Jean Chen Jean Chen
              None
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: