Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57465

iptables: Switch PREROUTING REDIRECT rule to DNAT instead

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-57065. The following is the description of the original issue:

      Description of problem:

      In a environment with Cilium that has the following VIP installed and the proxy listening on port 9445:
      
        sh-5.1# iptables-save -c | grep OCP
        [690459:41427540] -A PREROUTING -d 192.168.0.201/32 -p tcp -m tcp --dport 6443 -m comment --comment OCP_API_LB_REDIRECT -j REDIRECT --to-ports 9445
        [1483:88980] -A OUTPUT -d 192.168.0.201/32 -o lo -p tcp -m tcp --dport 6443 -m comment --comment OCP_API_LB_REDIRECT -j REDIRECT --to-ports 9445
      
      Connectivity works fine for all nodes not owning the VIP as well as curling the VIP via 192.168.0.201:6443 from the host/init network namespace. However, we found that when curling from a Pod, the skb with the SYN packet gets lost in the upper stack.
      
      pwru shows that the skb manages to traverse the network namespace, is pushed up the local stack where it later gets dropped in netfilter due to not finding a socket. The lookup is happening on port 6443 instead of 9445 where haproxy is listening on:
      
        sh-5.1# ./pwru 'host 192.168.0.201 and host 10.244.1.93'
        2025/06/03 08:49:07 Attaching kprobes (via kprobe-multi)...
        1542 / 1542 [--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------] 100.00% ? p/s
        2025/06/03 08:49:07 Attached (ignored 0)
        2025/06/03 08:49:07 Listening for events..
        SKB                CPU PROCESS          NETNS      MARK/x        IFACE       PROTO  MTU   LEN   TUPLE FUNC
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0               0         0x0000 1450  60    10.244.1.93:40454->192.168.0.201:6443(tcp) ip_local_out
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0               0         0x0000 1450  60    10.244.1.93:40454->192.168.0.201:6443(tcp) __ip_local_out
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0               0         0x0800 1450  60    10.244.1.93:40454->192.168.0.201:6443(tcp) nf_hook_slow
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0               0         0x0800 1450  60    10.244.1.93:40454->192.168.0.201:6443(tcp) selinux_ip_output
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0               0         0x0800 1450  60    10.244.1.93:40454->192.168.0.201:6443(tcp) ip_output
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1450  60    10.244.1.93:40454->192.168.0.201:6443(tcp) nf_hook_slow
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1450  60    10.244.1.93:40454->192.168.0.201:6443(tcp) selinux_ip_postroute
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1450  60    10.244.1.93:40454->192.168.0.201:6443(tcp) ip_finish_output
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1450  60    10.244.1.93:40454->192.168.0.201:6443(tcp) __ip_finish_output
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1450  60    10.244.1.93:40454->192.168.0.201:6443(tcp) ip_finish_output2
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1450  74    10.244.1.93:40454->192.168.0.201:6443(tcp) __dev_queue_xmit
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1450  74    10.244.1.93:40454->192.168.0.201:6443(tcp) qdisc_pkt_len_init
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) netdev_core_pick_tx
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) validate_xmit_skb
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) netif_skb_features
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) passthru_features_check
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) skb_network_protocol
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) skb_csum_hwoffload_help
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) validate_xmit_xfrm
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) dev_hard_start_xmit
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) skb_clone_tx_timestamp
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) __dev_forward_skb
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) __dev_forward_skb2
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) skb_scrub_packet
        0xffff99902145f0f0 2   ~in/curl:2724864 4026532832 0            eth0:116     0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) eth_type_trans
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 0        ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) __netif_rx
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 0        ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) netif_rx_internal
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 0        ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) enqueue_to_backlog
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 0        ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) __netif_receive_skb
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 0        ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) __netif_receive_skb_one_core
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 0        ~f632c0a1149:117 0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) skb_ensure_writable
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 0        ~f632c0a1149:117 0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) skb_ensure_writable
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 0        ~f632c0a1149:117 0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) skb_ensure_writable
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 0        ~f632c0a1149:117 0x0800 1500  74    10.244.1.93:40454->192.168.0.201:6443(tcp) skb_ensure_writable
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 7e7a0f01 ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) ip_rcv
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 7e7a0f01 ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) ip_rcv_core
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 7e7a0f01 ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) tcp_wfree
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 7e7a0f01 ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) nf_hook_slow
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 7e7a0f01 ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) nf_checksum
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 7e7a0f01 ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) nf_ip_checksum
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 7e7a0f01 ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) __inet_lookup_listener
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 7e7a0f01 ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) inet_lhash2_lookup
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 7e7a0f01 ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) inet_lhash2_lookup
        0xffff99902145f0f0 2   ~in/curl:2724864 4026531840 7e7a0f01 ~f632c0a1149:117 0x0800 1500  60    10.244.1.93:40454->192.168.0.201:6443(tcp) kfree_skb_reason(SKB_DROP_REASON_NETFILTER_DROP)
      
      We found that in this case the PREROUTING -j REDIRECT --to-ports 9445 does not trigger as expected. The local workaround we used to get connectivity was to install a DNAT rule instead of REDIRECT:
      
        iptables -t nat -I PREROUTING 1 -d 192.168.0.201 -p tcp --dport 6443 -j DNAT --to-destination 192.168.0.201:9445
      
      Resulting in ...
      
        [52:3120] -A PREROUTING -d 192.168.0.201/32 -p tcp -m tcp --dport 6443 -j DNAT --to-destination 192.168.0.201:9445
        [695732:41743920] -A PREROUTING -d 192.168.0.201/32 -p tcp -m tcp --dport 6443 -m comment --comment OCP_API_LB_REDIRECT -j REDIRECT --to-ports 9445
        [1483:88980] -A OUTPUT -d 192.168.0.201/32 -o lo -p tcp -m tcp --dport 6443 -m comment --comment OCP_API_LB_REDIRECT -j REDIRECT --to-ports 9445
      
      This got the connectivity working.
      

      Version-Release number of selected component (if applicable):

      OpenShift Container Platform 4.18 and later
      

      How reproducible:

      Always (when using Cilium)
      

      Steps to Reproduce:

      1. Install OpenShift Container Platform 4 with Cilium on Bare-Metal
      2. Install Advanced Cluster Management for Kubernetes 
      3. Try to import the loca-cluster into Advanced Cluster Management for Kubernetes 
      

      Actual results:

      Importing the local-cluster in Advanced Cluster Management for Kubernetes is failing due to the above connectivity issue
      

      Expected results:

      Importing the local-cluster in Advanced Cluster Management for Kubernetes should work and not cause connectivity issues
      

      Additional info:

      https://github.com/openshift/baremetal-runtimecfg/pull/349 is the pull request that aims to solve the problem
      

              bnemec@redhat.com Benjamin Nemec
              openshift-crt-jira-prow OpenShift Prow Bot
              None
              None
              Ross Brattain Ross Brattain
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: