Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-61833

EgressIP dual stack network - connectivity issue

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • x86_64
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:
      In a dual stack networking (ipv4/ipv6) Openshift cluster, deploying 1 EgressIP object (with both IP families) per pod is not working properly. Sometime we have connectivity  issue on one of the IP family (ping or curl to external system doesn't work for IPV4 or IPV6).

      Version-Release number of selected component (if applicable):
      OCP 4.16.z BareMetal, OVN
      OCP 4.18.z BareMetal, OVN

      How reproducible:
      Not systematic but we are able to reproduce many times and we have lab ready to reproduce the issue

      Steps to Reproduce:

      1. Deploy OCP in dual stack mode, with two worker node roles: appworker and gateway.

      Workload/Pod is deployed on appworker (regular worker nodes, no taints)

      gateway nodes are tainted, no workloads are expected here, their purpose is to handle non-multus based ingress (MetalLB) and non-multus egress using EgressIP.

      the two gateway nodes are labeled w/ k8s.ovn.org/egress-assignable. 
      vlan interface configured on secondary interface using nmstate  with IPV4/IPV6 for egressIP purpose and default route 

         routes:
           config:
           - destination: 0.0.0.0/0
             metric: 999
             next-hop-address: 192.168.118.1
             next-hop-interface: vlan94
             table-id: 254
           - destination: ::/0
             metric: 999
             next-hop-address: 2600:52:7:94::1
             next-hop-interface: vlan94
             table-id: 254
      

       

      2. Create a EgressIP (dual stack) for IPV4/IPV6 with namespaceSelector and podSelector
      deploy a pod on a appworker node and inside the pod try to reach an external system like http server (http server with both IPV4/IPV6 interface ) outside on OCP using curl for example. Pod is deploy in a namespace `test`  and the namespace has a label  env: qa

       

      apiVersion: k8s.ovn.org/v1
      kind: EgressIP
      metadata:
        name: egressip-dual-vlan94
      spec:
        egressIPs:
          - 192.168.118.30
          - 2600:52:7:94::30
        namespaceSelector:
          matchLabels:
            env: qa
        podSelector:
          matchLabels:
            egressip: ds

      Pod manifest 

      apiVersion: v1
      kind: Pod
      metadata:
        name: fedora-egressip-pod-ds
        namespace: test
        labels: 
          egressip: ds
          egressipv4v6: ipv4v6
      spec:
        containers:
        - name: fedora-curl
          image: quay.io/yogananth_subramanian/fedora-tools:latest
          command: ["/bin/bash", "-c", "sleep infinity"]
          securityContext:
            capabilities:
              add: ["NET_ADMIN"]
            privileged: true
        nodeSelector:
          node-role.kubernetes.io/appworker: ""  

      Open a shell in the pod and run a curl request inside the pod to the http server outside ocp using ipv4

      curl http://192.168.120.11:8080 

      sometime curl using ipv4 is not working.

      Open another shell in the pod and run a curl using IPV6 

      curl http://[2600:52:7:120::9]:8080 

      HTTP server log

      # tail -f /var/log/httpd/access_log                           2600:52:7:94::30 - - [28/Jul/2025:17:33:35 -0400] "GET / HTTP/1.1" 403 5909 "-" "curl/7.51.0"

      As you can see, only ipv6 curl and ping is working using egressIP.

       

      Actual results:
      from the pod only IPV6 connectivity test is ok using EgressIP. 

      Expected results:
      Pod must reach external system using EgressIP for both IP Families IPv4 and IPV6 

       

      Affected Platforms:
      OCP deployed on Baremetal - ovn kubernetes using ZTP/GitOps approach
      partner Lab
      Red HAT internal LAB

      Additional Info:
      We have a internal Lab to reproduce the issue.
      Jean Chen from QA was able to reproduce the issue also>

      Is it an

      1. customer issue / SD
      2. internal RedHat testing failure

      we have a dual-stack environment to mimic any additional test OCP EgressIP engineering team requires to do.
      Partner is also facing the same issue in their environment.

              sdn-team-bot sdn-team bot
              ecisse@redhat.com El Hadji Sidi Ahmed Cisse
              None
              Jose Nuñez
              Huiran Wang Huiran Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated: