Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-77458

In rhel 10 podman inside nodes cannot reach outside the cluster

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.22.0
    • RHCOS
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Low
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          When in a rhel10 clutser we connect to a node, and we try to use podman inside the node, podman cannot reach the network outside the cluster.
      
          If we do it in a rhel9 node there is no problem doing that.
      
      

      Version-Release number of selected component (if applicable):

      4.22    

      How reproducible:

          Always

      Steps to Reproduce:

          1. Enable the rhel10 stream in a 4.22 techpreview cluster
      
       oc patch mcp worker --type=merge -p "{\"spec\":{\"osImageStream\":{\"name\":\"rhel-10\"}}}"
      
      
      
          2. Wait until the worker pool is using rhel10
      
          3. Debug into a worker node with "oc debug...."
      
          4. Create a Containerfile with this content
      
      sh-5.2# cat Containerfile 
      FROM quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4a1798a3b92a794a69d56eaf78c1521a1c4d2e52fd05057072780ec19ccabd45
      RUN curl -LO https://pkgs.tailscale.com/stable/fedora/tailscale.repo
      
      
          5. Build an image using the container file 
      
      
           sh-5.2# podman build  --authfile /var/lib/kubelet/config.json  . -t with9 
      The result is this error
      STEP 2/2: RUN curl -LO https://pkgs.tailscale.com/stable/fedora/tailscale.repo
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
        0     0    0     0    0     0      0      0 --:--:--  0:00:19 --:--:--     0curl: (6) Could not resolve host: pkgs.tailscale.com
      Error: building at STEP "RUN curl -LO https://pkgs.tailscale.com/stable/fedora/tailscale.repo": while running runtime: exit status 6

      Actual results:

      The podman build command cannot reach the network outside the cluster    

      Expected results:

      The podman build command should be able to get the tailscale information and build the image

      Additional info:

          
      
      This is the AI analysis:
      
      Root Cause
      
        The problem is a conflict between two FORWARD chains combined with a firewall backend change between netavark versions on RHEL 9 vs RHEL 10.
      
        RHEL 9 (works) — netavark 1.17.2 with iptables-nft backend
      
        There is one FORWARD chain in table ip filter with policy DROP, and netavark injects its rules directly into it:
      
        chain FORWARD {
            policy drop;
            jump NETAVARK_FORWARD   <-- netavark adds this to allow container traffic
            <OVN rules>
        }
      
        Container traffic from 10.88.0.0/16 hits NETAVARK_FORWARD first and gets accepted. Packets are forwarded, masqueraded, and reach the internet.
      
        RHEL 10 (fails) — netavark 1.16.0 with native nftables backend
      
        There are two FORWARD chains at the same priority:
      
        1. table ip filter → chain FORWARD (from OVN/kube — policy DROP):
        chain FORWARD {
            policy drop;
            <only OVN/kube subnets accepted>
            # NO netavark rules here!
        }
        2. table inet netavark → chain FORWARD (from netavark — policy ACCEPT):
        chain FORWARD {
            policy accept;
            ct state invalid drop
            jump NETAVARK-ISOLATION-1
        }
      
        The OVN table ip filter FORWARD chain has policy DROP and only allows traffic to/from 169.254.0.1, 172.30.0.0/16, and 10.128.0.0/14. The podman bridge subnet 10.88.0.0/16 is not in that list. Even though
        netavark's own chain would accept it, the OVN/kube chain drops it first because both chains are at the same filter hook priority, and the ip family table processes before inet.
      
        So container traffic from 10.88.0.0/16 → 10.0.0.2 (DNS) gets dropped by the OVN FORWARD chain before it can reach the internet.
      
        Why --network host works
      
        With --network host, containers use the host's network stack directly — no bridge, no forwarding, no FORWARD chain involved.
      ~                                                                                                                                    
      
      
      
      
      
      
      Workaround
      
      Executing the "podman build" command with "--network host" will make the command work.

              Unassigned Unassigned
              sregidor@redhat.com Sergio Regidor de la Rosa
              Tiago Bueno Tiago Bueno
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: