Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43604

Allow from host network networkpolicies do not work during live migration

XMLWordPrintable

    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None
    • Hide
      *Cause*: What actions or circumstances cause this bug to present.
      *Consequence*: What happens when the bug presents.
      *Fix*: What was done to fix the bug.
      *Result*: Bug doesn’t present anymore.
      Show
      *Cause*: What actions or circumstances cause this bug to present. *Consequence*: What happens when the bug presents. *Fix*: What was done to fix the bug. *Result*: Bug doesn’t present anymore.
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-42605. The following is the description of the original issue:

      Description of problem:

      We are in a live migration scenario.

      If a project has a networkpolicy to allow from the host network (more concretely, to allow from the ingress controllers and the ingress controllers are in the host network), traffic doesn't work during the live migration between any ingress controller node (either migrated or not migrated) and an already migrated application node.

      I'll expand later in the description and internal comments, but the TL;DR is that the IPs of the tun0 of not migrated source nodes and the IPs of the ovn-k8s-mp0 from migrated source nodes are not added to the address sets related to the networkpolicy ACL in the target OVN-Kubernetes node, so that traffic is not allowed.

      Version-Release number of selected component (if applicable):

      4.16.13

      How reproducible:

      Always

      Steps to Reproduce:

      1. Before the migration: have a project with a networkpolicy that allows from the ingress controller and the ingress controller in the host network. Everything must work properly at this point.

      2. Start the migration

      3. During the migration, check connectivity from the host network of either a migrated node or a non-migrated node. Both will fail (checking from the same node doesn't fail)

      Actual results:

      Pod on the worker node is not reachable from the host network of the ingress controller node (unless the pod is in the same node than the ingress controller), which causes the ingress controller routes to throw 503 error.

      Expected results:

      Pod on the worker node to be reachable from the ingress controller node, even when the ingress controller node has not migrated yet and the application node has.

      Additional info:

      This is not a duplicate of OCPBUGS-42578. This bug refers to the host-to-pod communication path while the other one doesn't.

      This is a customer issue. More details to be included in private comments for privacy.

      Workaround: Creating a networkpolicy that explicitly allows traffic from tun0 and ovn-k8s-mp0 interfaces. However, note that the workaround can be problematic for clusters with hundreds or thousands of projects. Another possible workaround is to temporarily delete all the networkpolicies of the projects. But again, this may be problematic (and a security risk).

            pliurh Peng Liu
            openshift-crt-jira-prow OpenShift Prow Bot
            Zhanqi Zhao Zhanqi Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: