-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.14.z
Description of problem:
AdminPolicyBasedExternalRoute failing to create routes for the pods created after AdminPolicyBasedExternalRoute CR creation. However it is able to create routes for the pods which already exist before AdminPolicyBasedExternalRoute CR creation. This issue is happening on 120 node baremetal environment while running Perf&Scale ICNI2 tests. NOte: we have disbaled BFD in this testing because of https://issues.redhat.com/browse/OCPBUGS-25449
Version-Release number of selected component (if applicable):
4.14.1
How reproducible:
Always
Steps to Reproduce:
#!/bin/bash set -x # Create served-ns-1 and serving-ns-1 namespaces: echo "Create served-ns-1 and serving-ns-1 namespaces" date -u cat <<EOF | kubectl apply -f - --- apiVersion: v1 kind: Namespace metadata: name: served-ns-1 labels: kubernetes.io/metadata.name: served-ns-1 spec: {} --- apiVersion: v1 kind: Namespace metadata: name: serving-ns-1 labels: kubernetes.io/metadata.name: serving-ns-1 spec: {} EOF sleep 120 # create SRIOV network date -u echo "create SRIOV network" cat <<EOF | kubectl apply -f - apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: sriov-net-1 namespace: openshift-sriov-network-operator spec: ipam: | { "type": "static" } spoofChk: "off" trust: "on" resourceName: intelnics2 networkNamespace: serving-ns-1 EOF sleep 120 # create served pod before AdminPolicyBasedExternalRoute date -u echo "create served pod before AdminPolicyBasedExternalRoute" cat <<EOF | kubectl apply -f - apiVersion: v1 kind: Pod metadata: name: pod-served-before namespace: served-ns-1 spec: nodeSelector: kubernetes.io/hostname: worker010-fc640 containers: - args: - sleep - infinity name: app image: quay.io/centos/centos imagePullPolicy: IfNotPresent EOF sleep 120 # create AdminPolicyBasedExternalRoute CR date -u echo "create AdminPolicyBasedExternalRoute CR" cat <<EOF | kubectl apply -f - apiVersion: k8s.ovn.org/v1 kind: AdminPolicyBasedExternalRoute metadata: name: honeypotting spec: ## gateway example from: namespaceSelector: matchLabels: kubernetes.io/metadata.name: served-ns-1 nextHops: dynamic: - podSelector: matchLabels: lb: lb-1 namespaceSelector: matchLabels: kubernetes.io/metadata.name: serving-ns-1 networkAttachmentName: serving-ns-1/sriov-net-1 EOF sleep 120 # create serving pod date -u echo "create serving pod" cat <<EOF | kubectl apply -f - apiVersion: v1 kind: Pod metadata: name: pod-serving-1 namespace: serving-ns-1 labels: serving: true-1 lb: lb-1 annotations: k8s.v1.cni.cncf.io/networks: |- [{ "name": "sriov-net-1", "ips": [ "192.168.219.2/21" ] }] k8s.v1.cni.cncf.io/network-status: |- [{ "name": "serving-ns-1/sriov-net-1", "interface": "net1", "ips": [ "192.168.219.2" ], "dns": {} }] spec: containers: - name: frr image: centos command: - sleep - infinity securityContext: privileged: true nodeSelector: kubernetes.io/hostname: worker003-fc640 EOF sleep 120 # create served pod after AdminPolicyBasedExternalRoute date -u echo "create served pod after AdminPolicyBasedExternalRoute" cat <<EOF | kubectl apply -f - apiVersion: v1 kind: Pod metadata: name: pod-served-after namespace: served-ns-1 spec: nodeSelector: kubernetes.io/hostname: worker010-fc640 containers: - args: - sleep - infinity name: app image: quay.io/centos/centos imagePullPolicy: IfNotPresent EOF sleep 120 date -u echo "oc get pods -n served-ns-1 -o wide" oc get pods -n served-ns-1 -o wide date -u echo "oc rsh -n openshift-ovn-kubernetes -c nbdb ovnkube-node-vjd8p ovn-nbctl lr-route-list GR_worker010-fc640" oc rsh -n openshift-ovn-kubernetes -c nbdb ovnkube-node-vjd8p ovn-nbctl lr-route-list GR_worker010-fc640
Actual results:
We see routes only for pod-served-before pod (i.e 10.129.38.10) + echo 'oc get pods -n served-ns-1 -o wide' oc get pods -n served-ns-1 -o wide + oc get pods -n served-ns-1 -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-served-after 1/1 Running 0 2m 10.129.38.11 worker010-fc640 <none> <none> pod-served-before 1/1 Running 0 8m1s 10.129.38.10 worker010-fc640 <none> <none> + date -u Sat Feb 10 13:04:31 UTC 2024 + echo 'oc rsh -n openshift-ovn-kubernetes -c nbdb ovnkube-node-vjd8p ovn-nbctl lr-route-list GR_worker010-fc640' oc rsh -n openshift-ovn-kubernetes -c nbdb ovnkube-node-vjd8p ovn-nbctl lr-route-list GR_worker010-fc640 + oc rsh -n openshift-ovn-kubernetes -c nbdb ovnkube-node-vjd8p ovn-nbctl lr-route-list GR_worker010-fc640 IPv4 Routes Route Table <main>: 10.129.38.10 192.168.219.2 src-ip rtoe-GR_worker010-fc640 ecmp-symmetric-reply 169.254.169.0/29 169.254.169.4 dst-ip rtoe-GR_worker010-fc640 10.128.0.0/14 100.64.0.1 dst-ip 0.0.0.0/0 192.168.216.1 dst-ip rtoe-GR_worker010-fc640
Expected results:
We should see routes for both pod-served-before pod (i.e 10.129.38.10) and pod-served-after (10.129.38.11) pods.
Additional info:
must-gather - http://storage.scalelab.redhat.com/anilvenkata/must-gather.local.6409385997838158348.tar.gz All the resources in the above case are created between Sat Feb 10 12:52:28 UTC 2024 and Sat Feb 10 13:04:31 UTC 2024. Please use these timestamps in the logs.
- depends on
-
OCPBUGS-29680 AdminPolicyBasedExternalRoute CRD failing to watch and reconcile routes for later pods
- Closed
- duplicates
-
OCPBUGS-29939 [OVN] Static routes for the AdminPolicyBasedExternalRoute don't get recreated on the gateway routers when pods restart
- Closed
- is cloned by
-
OCPBUGS-29680 AdminPolicyBasedExternalRoute CRD failing to watch and reconcile routes for later pods
- Closed
- links to
-
RHBA-2024:1564 OpenShift Container Platform 4.14.z bug fix update