Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57343

openshift-kube-apiserver_installer-20 pods are stuck in ContainerCreating due to a recurring Multus CNI error: "error waiting for pod: pod not found".

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Undefined Undefined
    • None
    • 4.16
    • Networking / multus
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      upgrade history

      4.14.50
      4.15.50
      4.16.38
      

      Cluster was created with OpenShiftSND and live-migrated to OVN-V on 4.16.38

      A few days after live migration kube-apiserver became degraded.

      InstallerPodContainerWaitingDegraded: Pod
      "installer-20-integrity-gs02.sys.eng.rdu2.dc.redhat.com" on node
      "integrity-gs02.sys.eng.rdu2.dc.redhat.com" container "installer" is
      waiting since 2025-06-08 22:39:08 +0000 UTC because ContainerCreating
      InstallerPodNetworkingDegraded: Pod
      "installer-20-integrity-gs02.sys.eng.rdu2.dc.redhat.com" on node
      "integrity-gs02.sys.eng.rdu2.dc.redhat.com" observed degraded
      networking: (combined from similar events): Failed to create pod
      sandbox: rpc error: code = Unknown desc = failed to create pod network
      sandbox
      k8s_installer-20-integrity-gs02.sys.eng.rdu2.dc.redhat.com_openshift-kube-apiserver_73bd8800-ac3b-480d-aae1-e2659e80bc04_0(2c634982721651a6dd0d7da3521b84b845174ccdd356a960afcd704823e25d53):
      error adding pod
      openshift-kube-apiserver_installer-20-integrity-gs02.sys.eng.rdu2.dc.redhat.com
      to CNI network "multus-cni-network": plugin type="multus-shim"
      name="multus-cni-network" failed (add): CmdAdd (shim): CNI request
      failed with status 400:
      'ContainerID:"2c634982721651a6dd0d7da3521b84b845174ccdd356a960afcd704823e25d53"
      Netns:"/var/run/netns/b493971b-cc93-42bc-85cb-f0561fe46f23"
      IfName:"eth0"
      Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-kube-apiserver;K8S_POD_NAME=installer-20-integrity-gs02.sys.eng.rdu2.dc.redhat.com;K8S_POD_INFRA_CONTAINER_ID=2c634982721651a6dd0d7da3521b84b845174ccdd356a960afcd704823e25d53;K8S_POD_UID=73bd8800-ac3b-480d-aae1-e2659e80bc04"
      Path:"" ERRORED: error configuring pod
      [openshift-kube-apiserver/installer-20-integrity-gs02.sys.eng.rdu2.dc.redhat.com]
      networking: Multus:
      [openshift-kube-apiserver/installer-20-integrity-gs02.sys.eng.rdu2.dc.redhat.com/73bd8800-ac3b-480d-aae1-e2659e80bc04]:
      error waiting for pod: pod
      "installer-20-integrity-gs02.sys.eng.rdu2.dc.redhat.com" not found
      InstallerPodNetworkingDegraded: ': StdinData:
      {"binDir":"/var/lib/cni/bin","clusterNetwork":"/host/run/multus/cni/net.d/10-ovn-kubernetes.conf","cniVersion":"0.3.1","daemonSocketDir":"/run/multus/socket","globalNamespaces":"default,openshift-multus,openshift-sriov-network-operator","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","namespaceIsolation":true,"type":"multus-shim"}
      
      
      

      Version-Release number of selected component (if applicable):

      
      4.16.38
      

      How reproducible:

      Once
      

      Steps to Reproduce:

      1. install OpenShift SDN 4.14.50 and configure balance-slb OVS bonds using nmstate https://github.com/RHsyseng/rhcos-slb/blob/simplify-networking/README.md
      2. upgrade to 4.15
      3. upgrade to 4.16.38
      4. apply OVN-K live migration workarounds https://github.com/rbbratta/rhcos-slb/blob/sdn-to-ovn-migration/sdn-to-ovn/README.md
      5. live migrate to OVN-K
      6. wait
      

      Actual results:

      
      kube-apiserver                             4.16.38   True        True          True       34d     InstallerPodContainerWaitingDegraded
      
      

      Expected results:

      
      No cluster operators degraded
      
      

      Additional info:

      An installer pod on master-02 is persistently stuck in the ContainerCreating state.
      The pod's events show a repeating FailedCreatePodSandBox warning with the message: Multus: ... error waiting for pod: pod "..." not found.

      multus seem to have DNS issues.

      
      2025-06-09T15:11:43Z [error] failed to list pods with new certs: Get "https://api-int.integrity.sys.eng.rdu2.dc.redhat.com:6443/api/v1/pods": dial tcp: lookup api-int.integrity.sys.eng.rdu2.dc.redhat.com: no such host
      E0609 15:11:43.345999   86824 certificate_manager.go:562] kubernetes.io/kube-apiserver-client: Failed while requesting a signed certificate from the control plane: cannot create certificate signing request: Post "https://api-int.integrity.sys.eng.rdu2.dc.redhat.com:6443/apis/certificates.k8s.io/v1/certificatesigningrequests": dial tcp: lookup api-int.integrity.sys.eng.rdu2.dc.redhat.com: no such host
      W0609 15:11:53.327884   86824 reflector.go:539] k8s.io/client-go/informers/factory.go:159: failed to list *v1.Pod: Get "https://api-int.integrity.sys.eng.rdu2.dc.redhat.com:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dintegrity-gs02.sys.eng.rdu2.dc.redhat.com&resourceVersion=36232299": dial tcp: lookup api-int.integrity.sys.eng.rdu2.dc.redhat.com: no such host
      
      

              bnemec@redhat.com Benjamin Nemec
              rbrattai@redhat.com Ross Brattain
              None
              None
              Weibin Liang Weibin Liang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: