Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29342

AdminPolicyBasedExternalRoute CRD failing to watch and reconcile routes for later pods

XMLWordPrintable

    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      Cause: When reconciling an Admin Policy Based External Route CR, pods without a status IP should be considered as not processed
      Consequence: When a pod without a Status' IP, located in a namespaces managed by an Admin Policy Based External Route CR, is processed by the controller it fails to return an error and instead it acknowledges the pod as having been processed.
      Fix: Ensure that the controller does not record the Pod as successfully processed.
      Result: Pods without an IP in their status field keep being processed by the controller on each event change, until their IP field is populated and the controller can complete the reconciliation loop.
      Show
      Cause: When reconciling an Admin Policy Based External Route CR, pods without a status IP should be considered as not processed Consequence: When a pod without a Status' IP, located in a namespaces managed by an Admin Policy Based External Route CR, is processed by the controller it fails to return an error and instead it acknowledges the pod as having been processed. Fix: Ensure that the controller does not record the Pod as successfully processed. Result: Pods without an IP in their status field keep being processed by the controller on each event change, until their IP field is populated and the controller can complete the reconciliation loop.
    • Bug Fix
    • In Progress
    • MG's errors found;

      Description of problem:

          AdminPolicyBasedExternalRoute failing to create routes for the pods created after AdminPolicyBasedExternalRoute CR creation. However it is able to create routes for the pods which already exist before AdminPolicyBasedExternalRoute CR creation.
      
      This issue is happening on 120 node baremetal environment while running Perf&Scale ICNI2 tests.
      
      NOte: we have disbaled BFD in this testing because of https://issues.redhat.com/browse/OCPBUGS-25449 

       

      Version-Release number of selected component (if applicable):

      4.14.1    

      How reproducible:

          Always

      Steps to Reproduce:

      #!/bin/bash
      set -x 
      
      # Create served-ns-1 and serving-ns-1 namespaces:
      echo "Create served-ns-1 and serving-ns-1 namespaces"
      date -u
      cat <<EOF | kubectl apply -f -
      ---
      apiVersion: v1
      kind: Namespace
      metadata:
        name: served-ns-1
        labels:
          kubernetes.io/metadata.name: served-ns-1
      spec: {}
      ---
      apiVersion: v1
      kind: Namespace
      metadata:
        name: serving-ns-1
        labels:
          kubernetes.io/metadata.name: serving-ns-1
      spec: {}
      EOF
      
      sleep 120
      
      # create SRIOV network
      date -u
      echo "create SRIOV network"
      cat <<EOF | kubectl apply -f -
      apiVersion: sriovnetwork.openshift.io/v1
      kind: SriovNetwork
      metadata:
        name: sriov-net-1
        namespace: openshift-sriov-network-operator
      spec:
        ipam: |
          {
            "type": "static"
          }
        spoofChk: "off"
        trust: "on"
        resourceName: intelnics2
        networkNamespace: serving-ns-1
      EOF
      
      sleep 120
      
      # create served pod before AdminPolicyBasedExternalRoute
      date -u
      echo "create served pod before AdminPolicyBasedExternalRoute"
      cat <<EOF | kubectl apply -f -
      apiVersion: v1
      kind: Pod
      metadata:
        name: pod-served-before
        namespace: served-ns-1
      spec:
        nodeSelector:
          kubernetes.io/hostname: worker010-fc640
        containers:
        - args:
          - sleep
          - infinity
          name: app
          image: quay.io/centos/centos
          imagePullPolicy: IfNotPresent
      EOF
      
      sleep 120
      
      # create AdminPolicyBasedExternalRoute CR
      date -u
      echo "create AdminPolicyBasedExternalRoute CR"
      cat <<EOF | kubectl apply -f -
      apiVersion: k8s.ovn.org/v1
      kind: AdminPolicyBasedExternalRoute
      metadata:
        name: honeypotting
      spec:
      ## gateway example
        from:
          namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: served-ns-1
        nextHops:
          dynamic:
            - podSelector:
                matchLabels:
                  lb: lb-1
              namespaceSelector:
                matchLabels:
                  kubernetes.io/metadata.name: serving-ns-1
              networkAttachmentName: serving-ns-1/sriov-net-1
      EOF
      
      sleep 120
      
      # create serving pod
      date -u
      echo "create serving pod"
      cat <<EOF | kubectl apply -f -
      apiVersion: v1
      kind: Pod
      metadata:
        name: pod-serving-1
        namespace: serving-ns-1
        labels:
          serving: true-1
          lb: lb-1
        annotations:
              k8s.v1.cni.cncf.io/networks: |-
                [{
                    "name": "sriov-net-1",
                    "ips": [ "192.168.219.2/21" ]
                }]
              k8s.v1.cni.cncf.io/network-status: |-
                [{
                    "name": "serving-ns-1/sriov-net-1",
                    "interface": "net1",
                    "ips": [ "192.168.219.2" ],
                    "dns": {}
                }]
      spec:
        containers:
        - name: frr
          image: centos
          command:
            - sleep
            - infinity
          securityContext:
            privileged: true
        nodeSelector:
          kubernetes.io/hostname: worker003-fc640
      EOF
      
      sleep 120
      # create served pod after AdminPolicyBasedExternalRoute
      date -u
      echo "create served pod after AdminPolicyBasedExternalRoute"
      cat <<EOF | kubectl apply -f -
      apiVersion: v1
      kind: Pod
      metadata:
        name: pod-served-after
        namespace: served-ns-1
      spec:
        nodeSelector:
          kubernetes.io/hostname: worker010-fc640
        containers:
        - args:
          - sleep
          - infinity
          name: app
          image: quay.io/centos/centos
          imagePullPolicy: IfNotPresent
      EOF
      
      sleep 120
      date -u
      echo "oc get pods -n served-ns-1 -o wide"
      oc get pods -n served-ns-1 -o wide
      date -u
      echo "oc rsh -n openshift-ovn-kubernetes -c nbdb ovnkube-node-vjd8p ovn-nbctl lr-route-list GR_worker010-fc640"
      oc rsh -n openshift-ovn-kubernetes -c nbdb ovnkube-node-vjd8p ovn-nbctl lr-route-list GR_worker010-fc640
           

      Actual results:

      We see routes only for pod-served-before pod (i.e 10.129.38.10)
      
      + echo 'oc get pods -n served-ns-1 -o wide'
      oc get pods -n served-ns-1 -o wide
      + oc get pods -n served-ns-1 -o wide
      NAME                READY   STATUS    RESTARTS   AGE    IP             NODE              NOMINATED NODE   READINESS GATES
      pod-served-after    1/1     Running   0          2m     10.129.38.11   worker010-fc640   <none>           <none>
      pod-served-before   1/1     Running   0          8m1s   10.129.38.10   worker010-fc640   <none>           <none>
      + date -u
      Sat Feb 10 13:04:31 UTC 2024
      + echo 'oc rsh -n openshift-ovn-kubernetes -c nbdb ovnkube-node-vjd8p ovn-nbctl lr-route-list GR_worker010-fc640'
      oc rsh -n openshift-ovn-kubernetes -c nbdb ovnkube-node-vjd8p ovn-nbctl lr-route-list GR_worker010-fc640
      + oc rsh -n openshift-ovn-kubernetes -c nbdb ovnkube-node-vjd8p ovn-nbctl lr-route-list GR_worker010-fc640
      IPv4 Routes
      Route Table <main>:
                   10.129.38.10             192.168.219.2 src-ip rtoe-GR_worker010-fc640 ecmp-symmetric-reply
               169.254.169.0/29             169.254.169.4 dst-ip rtoe-GR_worker010-fc640
                  10.128.0.0/14                100.64.0.1 dst-ip
                      0.0.0.0/0             192.168.216.1 dst-ip rtoe-GR_worker010-fc640
      

      Expected results:

          We should see routes for both pod-served-before pod (i.e 10.129.38.10) and pod-served-after (10.129.38.11) pods.

      Additional info:

       must-gather - http://storage.scalelab.redhat.com/anilvenkata/must-gather.local.6409385997838158348.tar.gz
      
        All the resources in the above case are created between Sat Feb 10 12:52:28 UTC 2024 and Sat Feb 10 13:04:31 UTC 2024. Please use these timestamps in the logs.

       

            jgil@redhat.com Jordi Gil
            vkommadi@redhat.com VENKATA ANIL kumar KOMMADDI
            Dave Wilson Dave Wilson
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: