Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30170

Pod stuck in ContainerCreating status - LoadConf(): VF pci addr is required

    XMLWordPrintable

Details

    • Critical
    • No
    • CNF Network Sprint 251
    • 1
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      Pods stuck in ContainerCreating status after cluster's ungraceful reboot:
      
      oc get po -n rds-sriov-wlkd -o wide
      NAME                                 READY   STATUS              RESTARTS   AGE
      rdscore-sriov-one-7f6d8759cd-fptsk   0/1     ContainerCreating   0          22m
      rdscore-sriov-two-5785fcb978-7dntx   0/1     ContainerCreating   0          22m
      rdscore-sriov2-one-7875d45c4-vm9kh   0/1     ContainerCreating   0          22m
      rdscore-sriov2-two-886767847-bb2wg   0/1     ContainerCreating   0          22m    
      oc describe po -n rds-sriov-wlkd rdscore-sriov-one-7f6d8759cd-fptsk
      Name:             rdscore-sriov-one-7f6d8759cd-fptsk
      Namespace:        rds-sriov-wlkd
      Priority:         0
      Service Account:  rdscore-sriov-sa-one
      Node:             openshift-worker-2.qe.lab.eng.tlv2.redhat.com/10.46.187.25
      Start Time:       Mon, 04 Mar 2024 10:03:37 +0200
      Labels:           pod-template-hash=7f6d8759cd
                        rds-core=sriov-deploy-one
      Annotations:      k8s.ovn.org/pod-networks:
                          {"default":{"ip_addresses":["10.130.0.13/23","fd01:0:0:3::d/64"],"mac_address":"0a:58:0a:82:00:0d","gateway_ips":["10.130.0.1","fd01:0:0:3...
                        k8s.v1.cni.cncf.io/networks: [{"name":"sriov-net-one","cni-args":null}]
                        openshift.io/scc: privileged
      Status:           Pending
      IP:
      IPs:              <none>
      Controlled By:    ReplicaSet/rdscore-sriov-one-7f6d8759cd
      Containers:
        sriov-one:
          Container ID:
          Image:           registry.qe.lab.eng.tlv2.redhat.com:5000/karampok/snife:latest
          Image ID:
          Port:            <none>
          Host Port:       <none>
          SeccompProfile:  RuntimeDefault
          Command:
            /bin/sh
            -c
            /opt/net/config.sh 497 10.46.126.75/26; nc -k -l 10.46.126.75 1111
          State:          Waiting
            Reason:       ContainerCreating
          Ready:          False
          Restart Count:  0
          Environment:    <none>
          Mounts:
            /opt/net/ from configs (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n2jbs (ro)
      Conditions:
        Type              Status
        Initialized       True
        Ready             False
        ContainersReady   False
        PodScheduled      True
      Volumes:
        configs:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      rdscore-sriov-config
          Optional:  false
        kube-api-access-n2jbs:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3607
          ConfigMapName:           kube-root-ca.crt
          ConfigMapOptional:       <nil>
          DownwardAPI:             true
          ConfigMapName:           openshift-service-ca.crt
          ConfigMapOptional:       <nil>
      QoS Class:                   BestEffort
      Node-Selectors:              kubernetes.io/hostname=openshift-worker-2.qe.lab.eng.tlv2.redhat.com
      Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Events:
        Type     Reason                  Age   From               Message
        ----     ------                  ----  ----               -------
        Normal   Scheduled               14m   default-scheduler  Successfully assigned rds-sriov-wlkd/rdscore-sriov-one-7f6d8759cd-fptsk to openshift-worker-2.qe.lab.eng.tlv2.redhat.com
        Warning  FailedCreatePodSandBox  14m   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_rdscore-sriov-one-7f6d8759cd-fptsk_rds-sriov-wlkd_76c7ac9b
      -c623-45a2-a478-fb484b9ae4c4_0(3ddd90a29718eb4453419093e4de539583c6e2c69da0f2631bd4e3643392e8ce): error adding pod rds-sriov-wlkd_rdscore-sriov-one-7f6d8759cd-fptsk to CNI network "multus-cni-network": plugin type="multus-
      shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:3ddd90a29718eb4453419093e4de539583c6e2c69da0f2631bd4e3643392e8ce Netns:/var/run/netns/11630ce9-43ba-46a4-8611-
      bc6d32201c56 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=rds-sriov-wlkd;K8S_POD_NAME=rdscore-sriov-one-7f6d8759cd-fptsk;K8S_POD_INFRA_CONTAINER_ID=3ddd90a29718eb4453419093e4de539583c6e2c69da0f2631bd4e3643392e8ce;K8S
      _POD_UID=76c7ac9b-c623-45a2-a478-fb484b9ae4c4 Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 98 47 99 110 105 47 98 105 110 34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 11
      5 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107
      117 98 101 114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 10
      5 111 110 34 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 103 108 111 98 97 108 78 97 109 101 1
      15 112 97 99 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110 101 116 119 111 114 107 45 111
       112 101 114 97 116 111 114 34 44 34 108 111 103 76 101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111
      99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97
       117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112
       101 114 78 111 100 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107
       117 98 101 99 111 110 102 105 103 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50
      52 104 34 44 34 101 110 97 98 108 101 100 34 58 116 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 1
      12 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 125]} ContainerID:"3ddd90a29718eb4453419093e4de539583c6e2c69da0f2631bd4e3643392e8ce" Netns:"/var/run/netns/11630ce9-43ba-46a4-8611-bc6d32201c56" IfName:"eth0" A
      rgs:"IgnoreUnknown=1;K8S_POD_NAMESPACE=rds-sriov-wlkd;K8S_POD_NAME=rdscore-sriov-one-7f6d8759cd-fptsk;K8S_POD_INFRA_CONTAINER_ID=3ddd90a29718eb4453419093e4de539583c6e2c69da0f2631bd4e3643392e8ce;K8S_POD_UID=76c7ac9b-c623-45
      a2-a478-fb484b9ae4c4" Path:"" ERRORED: error configuring pod [rds-sriov-wlkd/rdscore-sriov-one-7f6d8759cd-fptsk] networking: [rds-sriov-wlkd/rdscore-sriov-one-7f6d8759cd-fptsk/76c7ac9b-c623-45a2-a478-fb484b9ae4c4:sriov-net
      -one]: error adding container to network "sriov-net-one": SRIOV-CNI failed to load netconf: LoadConf(): VF pci addr is required
      '
        Warning  FailedCreatePodSandBox  14m  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_rdscore-sriov-one-7f6d8759cd-fptsk_rds-sriov-wlkd_76c7ac9b-c623-45a2-
      a478-fb484b9ae4c4_0(c628942f8d7bae04428de0270de7cf09edc660fbf706930f1ae85f1c70cf7f29): error adding pod rds-sriov-wlkd_rdscore-sriov-one-7f6d8759cd-fptsk to CNI network "multus-cni-network": plugin type="multus-shim" name=
      "multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:c628942f8d7bae04428de0270de7cf09edc660fbf706930f1ae85f1c70cf7f29 Netns:/var/run/netns/8e7c21aa-2823-41d1-b980-9ff5518db17
      3 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=rds-sriov-wlkd;K8S_POD_NAME=rdscore-sriov-one-7f6d8759cd-fptsk;K8S_POD_INFRA_CONTAINER_ID=c628942f8d7bae04428de0270de7cf09edc660fbf706930f1ae85f1c70cf7f29;K8S_POD_UID=76
      c7ac9b-c623-45a2-a478-fb484b9ae4c4 Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 98 47 99 110 105 47 98 105 110 34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 1
      11 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107 117 98 101
      114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 3
      4 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 103 108 111 98 97 108 78 97 109 101 115 112 97 9
      9 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110 101 116 119 111 114 107 45 111 112 101 11
      4 97 116 111 114 34 44 34 108 111 103 76 101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110
      102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 11
      1 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112 101 114 78
       111 100 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107 117 98 101
       99 111 110 102 105 103 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50 52 104 34 4
      4 34 101 110 97 98 108 101 100 34 58 116 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 112 101 34 5
      8 34 109 117 108 116 117 115 45 115 104 105 109 34 125]} ContainerID:"c628942f8d7bae04428de0270de7cf09edc660fbf706930f1ae85f1c70cf7f29" Netns:"/var/run/netns/8e7c21aa-2823-41d1-b980-9ff5518db173" IfName:"eth0" Args:"Ignore
      Unknown=1;K8S_POD_NAMESPACE=rds-sriov-wlkd;K8S_POD_NAME=rdscore-sriov-one-7f6d8759cd-fptsk;K8S_POD_INFRA_CONTAINER_ID=c628942f8d7bae04428de0270de7cf09edc660fbf706930f1ae85f1c70cf7f29;K8S_POD_UID=76c7ac9b-c623-45a2-a478-fb4
      84b9ae4c4" Path:"" ERRORED: error configuring pod [rds-sriov-wlkd/rdscore-sriov-one-7f6d8759cd-fptsk] networking: [rds-sriov-wlkd/rdscore-sriov-one-7f6d8759cd-fptsk/76c7ac9b-c623-45a2-a478-fb484b9ae4c4:sriov-net-one]: erro
      r adding container to network "sriov-net-one": SRIOV-CNI failed to load netconf: LoadConf(): VF pci addr is required
      '

       

       

      Version-Release number of selected component (if applicable):

      4.15.0
      sriov-network-operator.v4.14.0-202402081809

      How reproducible:

      so 100%

      Steps to Reproduce:

          1. Deploy a workload that uses SR-IOV
          2. Hard reboot cluster(all nodes at the same time)
          3. Wait for cluster to recover(all Cluster operators back to Available state)
          4. Remove all pods with UnexpectedAdmissionError status
          5. Check workload that uses SR-IOV      

      Actual results:

      Pods stuck in ContainerCreating

      Expected results:

      Pods using SR-IOV successfully restarted after cluster's hard reboot

      Additional info:

      Baremetal dualstack cluster

      Attachments

        Issue Links

          Activity

            People

              sscheink@redhat.com Sebastian Scheinkman
              yprokule@redhat.com Yurii Prokulevych
              Zhanqi Zhao Zhanqi Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: