Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63007

NAT entry missing for a HCP cluster VM

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • CORENET Sprint 278, CORENET Sprint 279
    • 2
    • In Progress
    • Release Note Not Required
    • Hide
      What: A VM virt-launcher pod could spontaneously lose network connectivity.
      Why: This requires a VM to have live-migrated at least twice and to a different node than where it was originally hosted. In that case, its assigned IPs where accidentally being released. Another pod could then be assigned the same IPs. From this point, connectivity of both the VM and the latter pod was undefined. The most likely occurrence is that the VM will lose connective once the latter pod completes and the IPs and all its associated resources are released.
      Fix: Prevent the accidental release of the IPs of a VM as it live-migrates.
      Now: IPs and associated resources for a VM are kept allocated and associated to the VM as it live-migrates, preventing their premature release and usage for other pods and up-keeping the connectivity of the VM thoughtout its lifecycle.
      Show
      What: A VM virt-launcher pod could spontaneously lose network connectivity. Why: This requires a VM to have live-migrated at least twice and to a different node than where it was originally hosted. In that case, its assigned IPs where accidentally being released. Another pod could then be assigned the same IPs. From this point, connectivity of both the VM and the latter pod was undefined. The most likely occurrence is that the VM will lose connective once the latter pod completes and the IPs and all its associated resources are released. Fix: Prevent the accidental release of the IPs of a VM as it live-migrates. Now: IPs and associated resources for a VM are kept allocated and associated to the VM as it live-migrates, preventing their premature release and usage for other pods and up-keeping the connectivity of the VM thoughtout its lifecycle.
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-56783. The following is the description of the original issue:

      Description of problem:

      OVN HCP cluster worker node  is unable to reach external network due to a missing NAT in the local cluster's(ACM) gateway router  for the specific HCP  worker node.

      Non-working VM: 

      1. ovn-nbctl lr-nat-list GR_host1.acp.example.net|grep 10.197.x.x

      <No Output>

       

      Working VM:

      For the working VM, there is a correct NAT entry. 

      [root@e35815ba6b1d ~]# ovn-nbctl lr-nat-list GR_host1.acp.example.net|grep 10.196.x.x
      TYPE             GATEWAY_PORT          MATCH                 EXTERNAL_IP        EXTERNAL_PORT    LOGICAL_IP          EXTERNAL_MAC         LOGICAL_PORT
      snat                                                         131.97.x.x                        10.196.x.x

       

      Version-Release number of selected component (if applicable):

      $oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.17.15   True        False         13d     Cluster version is 4.17.15

      How reproducible:

      N/A - only OVN DB  and MG are available to  troubleshoot this issue. 

      MG: https://attachments.access.redhat.com/hydra/rest/cases/04144466/attachments/b1174b1b-99bd-4c06-ab96-29ecf066fb89?usePresignedUrl=true

       

              jcaamano@redhat.com Jaime Caamaño Ruiz
              rhn-support-rsahoo Ramesh Sahoo
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: