Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44493

[OVN] Several problems with OVN handling EgressIPs with dual stack and on additional networks

XMLWordPrintable

    • Important
    • None
    • False
    • Hide

      None

      Show
      None
    • Customer Escalated
    • 11/18 workaround to restart ovnkube-node pods

      Description of problem:

      There seems to exist a lot of issues with dual stack implementation with egressIPs and in this case when egressIPs are configured on additional networks.

      Customer has 2 nodes to work as egressIP nodes configured and these nodes have a set of VLANs to handle the egress traffic for the egressIPs configured. In general there aren't many egressIP CRs (5/6 maximum) and each egressIP all have 2 IPs, for one per node.

      So far it looks like we have different issues that perhaps require separate Jiras, but I'm not sure if all combined is not the main problem affecting the customer.

       - To start we are only seeing in the .status.items one IP of one version per node. This seems related with the PR we mentioned here:

      https://github.com/openshift/ovn-kubernetes/blob/86430bef1064d4949b9b0d54accf258532e9e3c4/go-controller/pkg/crd/egressip/v1/types.go#L7

       - In the CRD this is mostly informational, but the problem is bigger than not just seeing it displayed in the status fields. Even though the CRD allows combination of both IP versions, it looks like OVN can't processed as such:

        1. The IP version chosen seems random and one is chosen the other one is not. For example yesterday I recreated my EIP with dual stack and IPv6 was the one only being assigned and configured in the LRPs and nodes. Today after booting the cluster, IPv4 was chosen and only this is created in OVN:

      apiVersion: k8s.ovn.org/v1
      kind: EgressIP
      metadata:
        creationTimestamp: "2024-11-12T15:06:20Z"
        generation: 5
        name: egress-agnhost-websrv
        resourceVersion: "5689602"
        uid: b6597209-8fb9-472e-96e1-26cf61c3f387
      spec:
        egressIPs:
        - 172.23.183.20
        - 172.23.183.21
        - fdca:5d7b:fdda:f266::28:7b90
        - fdca:5d7b:fdda:f266::28:7b6f
        namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: agnhost-websrv-testing
        podSelector: {}
      status:
        items:
        - egressIP: 172.23.183.21
          node: infra-0.prod-openshift4.redhatrules.local
        - egressIP: 172.23.183.20
          node: infra-1.prod-openshift4.redhatrules.local

      -------------------------------------------
      LRP on ovnkube-node-cmhkx
      -------------------------------------------
      _uuid               : 075e3964-b9bc-4e90-bc2c-0000d30e55b3
      action              : reroute
      external_ids        : {name=egress-agnhost-websrv}
      match               : "ip4.src == 10.192.14.18"
      nexthop             : []
      nexthops            : ["10.192.6.2"]
      options             : {}
      priority            : 100

      _uuid               : 96b2506c-7f2e-41c9-8ebd-50f84a183150
      action              : reroute
      external_ids        : {name=egress-agnhost-websrv}
      match               : "ip4.src == 10.192.14.28"
      nexthop             : []
      nexthops            : ["10.192.6.2"]
      options             : {}
      priority            : 100

      _uuid               : a7f04cdd-ccf2-4670-8d20-6fd4cf3a813b
      action              : reroute
      external_ids        : {name=egress-agnhost-websrv}
      match               : "ip4.src == 10.192.12.31"
      nexthop             : []
      nexthops            : ["10.192.6.2"]
      options             : {}
      priority            : 100

      _uuid               : 2b9885bd-fdef-4a7f-85fd-7f7a34a416d2
      action              : reroute
      external_ids        : {name=egress-agnhost-websrv}
      match               : "ip4.src == 10.192.12.24"
      nexthop             : []
      nexthops            : ["10.192.6.2"]
      options             : {}
      priority            : 100
      -------------------------------------------
      LRP on ovnkube-node-bsqb7
      -------------------------------------------
      _uuid               : 196166b8-a256-40a5-8295-f9f6429c0183
      action              : reroute
      external_ids        : {name=egress-agnhost-websrv}
      match               : "ip4.src == 10.192.14.28"
      nexthop             : []
      nexthops            : ["100.88.0.3", "100.88.0.6"]
      options             : {}
      priority            : 100

      _uuid               : c38a2a9b-87a8-496e-9702-cfc7caf05414
      action              : reroute
      external_ids        : {name=egress-agnhost-websrv}
      match               : "ip4.src == 10.192.14.18"
      nexthop             : []
      nexthops            : ["100.88.0.3", "100.88.0.6"]
      options             : {}

      If I looked at the node there is also no reference to IPv6 addresses:

      https://privatebin.corp.redhat.com/?72d65eb8050416cb#6E5Jf7vNY1EFxbQPe7Mf3KdckjXvsVwD8wnQhd61NMJG

        2. If this changes it can create issues on the ongoing connections and it seems to show that dual stack is not really supported with egressIPs as it is expected by the customer.

       

       Version-Release number of selected component (if applicable):

      OCP 4.14 on bare metal

      How reproducible:

      Unknown. So far it is hard to say what actually triggers the issue with inconsistent DB entries for the LRPs.

      Steps to Reproduce:

      1. Enable dual-stack network

      2. Create egressIPs with dual stack on additional networks

              mkennell@redhat.com Martin Kennelly
              rhn-support-andcosta Andre Costa
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: