Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-19544

Pods crashlooping and stuck in init

XMLWordPrintable

    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Affected version: 4.12.28 (hypershift management cluster)

      Pods are failing to start (init or crashlooping):

        Warning  FailedCreatePodSandBox  4s (x24 over 51m)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_etcd-1_ocm-production-<clusterid>-<clustername>_7de665cb-617a-4abb-8289-4810489edf14_0(05e3c91228b60ca43b811f41e42215441aab5f8513abc185cc1206a418d5d8ea): error adding pod ocm-production-<clusterid>-<clustername>_etcd-1 to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [ocm-production-<clusterid>-<clustername>/etcd-1/7de665cb-617a-4abb-8289-4810489edf14:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[ocm-production-<clusterid>-<clustername>/etcd-1 05e3c91228b60ca43b811f41e42215441aab5f8513abc185cc1206a418d5d8ea] [ocm-production-<clusterid>-<clustername>/etcd-1 05e3c91228b60ca43b811f41e42215441aab5f8513abc185cc1206a418d5d8ea] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
      
      Events:
        Type     Reason                  Age    From                     Message
        ----     ------                  ----   ----                     -------
        Normal   Scheduled               6m55s  default-scheduler        Successfully assigned ocm-production-<clusterid>-<clustername>/etcd-1 to ip-10-0-1-33.ec2.internal
        Warning  ErrorAddingLogicalPort  6m56s  controlplane             deleteLogicalPort failed for pod ocm-production-<clusterid>-<clustername>_etcd-1: cannot delete GR SNAT for pod ocm-production-<clusterid>-<clustername>/etcd-1: failed create operation for deleting SNAT rule for pod on gateway router GR_ip-10-0-1-218.ec2.internal: unable to get NAT entries for router &{UUID: Copp:<nil> Enabled:<nil> ExternalIDs:map[] LoadBalancer:[] LoadBalancerGroup:[] Name:GR_ip-10-0-1-218.ec2.internal Nat:[] Options:map[] Policies:[] Ports:[] StaticRoutes:[]}: failed to get router: GR_ip-10-0-1-218.ec2.internal, error: object not found
        Warning  FailedAttachVolume      6m56s  attachdetach-controller  Multi-Attach error for volume "pvc-9564374b-250a-4b5d-99b2-63db59ae9510" Volume is already exclusively attached to one node and can't be attached to another
        Warning  FailedMount             4m53s  kubelet                  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[kube-api-access-z2v2v data peer-tls server-tls client-tls etcd-ca etcd-metrics-ca]: timed out waiting for the condition
        Warning  FailedMount             2m37s  kubelet                  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[client-tls etcd-ca etcd-metrics-ca kube-api-access-z2v2v data peer-tls server-tls]: timed out waiting for the condition
        Normal   SuccessfulAttachVolume  49s    attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-9564374b-250a-4b5d-99b2-63db59ae9510"
      

      Must gather and hypershift dump are attached.

            ffernand@redhat.com Flavio Fernandes (Inactive)
            cbusse.openshift Claudio Busse
            Anurag Saxena Anurag Saxena
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: