Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42834

After externallymanaged SRIOV policy being deleted then recreated with an additional VF, test pod that uses the sriovnetwork stuck in pending state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.17.z
    • Networking / SR-IOV
    • None
    • Yes
    • CNF Network Sprint 261, CNF Network Sprint 262, CNF Network Sprint 264
    • 3
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          After externallymanaged SRIOV VFs being deleted then recreated with an additional VF, test pod that uses the sriovnetwork stuck in pending state

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1. Ran automated case OCP-63533, or follow the steps defined in OCP-63533
          2.  First created SR-IOV policy with 2 VFs, created two test pods, traffic between two pods ran fine.
          3.  After deleting SR-IOV policy with previous 2 VFs, recreated the SR-IOV policy with 3 VFs, then recreated test pods that use the sriovnetwork, test pods stuck the in pending state
          

      Actual results:

          Test pods that uses the recreated sriovnetwork stuck in pending state although nns are shown available

      Expected results:

          Test pods that uses the recreated sriovnetwork should be running, and traffic pass between two test pods

      Additional info:

      # oc describe pod/sriov-63533-test-pod1 
      Name:             sriov-63533-test-pod1
      Namespace:        e2e-test-sriov-oalqtuey-ng8jq
      Priority:         0
      Service Account:  default
      Node:             <none>
      Labels:           app=sriov-63533-test-pod1
      Annotations:      k8s.v1.cni.cncf.io/networks:
                          [
                            {
                              "name": "sriovnn",
                              "mac": "20:04:0f:f1:88:01",
                              "ips": ["192.168.10.1/24"]
                            }
                          ]
                        openshift.io/scc: anyuid
      Status:           Pending
      IP:               
      IPs:              <none>
      Containers:
        sample-container:
          Image:      quay.io/openshifttest/hello-sdn@sha256:c89445416459e7adea9a5a416b3365ed3d74f2491beb904d61dc8d1eb89a72a4
          Port:       <none>
          Host Port:  <none>
          Limits:
            openshift.io/sriovnn:  1
          Requests:
            openshift.io/sriovnn:  1
          Environment:             <none>
          Mounts:
            /etc/podnetinfo from podnetinfo (ro)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h28jr (ro)
      Conditions:
        Type           Status
        PodScheduled   False 
      Volumes:
        kube-api-access-h28jr:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3607
          ConfigMapName:           kube-root-ca.crt
          ConfigMapOptional:       <nil>
          DownwardAPI:             true
          ConfigMapName:           openshift-service-ca.crt
          ConfigMapOptional:       <nil>
        podnetinfo:
          Type:  DownwardAPI (a volume populated by information about the pod)
          Items:
            metadata.labels -> labels
            metadata.annotations -> annotations
      QoS Class:       BestEffort
      Node-Selectors:  <none>
      Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                       node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Events:
        Type     Reason            Age   From               Message
        ----     ------            ----  ----               -------
        Warning  FailedScheduling  70s   default-scheduler  0/8 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 5 Insufficient openshift.io/sriovnn. preemption: 0/8 nodes are available: 3 Preemption is not helpful for scheduling, 5 No preemption victims found for incoming pod.
      
      # oc get nns
      NAME                                       AGE
      master-0                                   94m
      master-1                                   94m
      master-2                                   94m
      openshift-qe-025.lab.eng.rdu2.redhat.com   94m
      openshift-qe-029.lab.eng.rdu2.redhat.com   94m
      worker-0                                   93m
      worker-1                                   94m
      worker-2                                   94m
      

              sscheink@redhat.com Sebastian Scheinkman
              jechen@redhat.com Jean Chen
              Jean Chen Jean Chen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: