Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-28805

[2203786] On BM, hpp-pool pods are stuck in CrashLoopBackOff

XMLWordPrintable

    • High
    • No

      Description of problem:
      On BM's (this one seen on bm03-cnvqe2-rdu2), hpp-pool pod gets stuck in a CrashLoopBackOff state.
      Seems like it is ceph related. HPP is backed by OCS.

      Version-Release number of selected component (if applicable):
      $ oc get csv -A | grep kubevirt
      openshift-cnv kubevirt-hyperconverged-operator.v4.13.0 OpenShift Virtualization 4.13.0 kubevirt-hyperconverged-operator.v4.12.3 Succeeded
      $ oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.13.0-rc.5 True False 11d Cluster version is 4.13.0-rc.5

      How reproducible:
      I'm not sure what's triggering this issue. After running the network team test suite, which creates network components and VMs, on some BM's the hpp-pool gets into this state.

      Steps to Reproduce:
      1.
      2.
      3.

      Actual results:
      Hpp-pool pod gets stuck in a CrashLoopBackOff state.
      $ oc get pods -n openshift-cnv | grep hpp
      openshift-cnv hpp-pool-29ab9406-755647446d-44jfk 0/1 Terminating 10 43h
      openshift-cnv hpp-pool-29ab9406-755647446d-d6rn7 0/1 CrashLoopBackOff 497 (4m5s ago) 42h
      openshift-cnv hpp-pool-4356e54b-7df67db896-8vq5t 0/1 Terminating 3 43h
      openshift-cnv hpp-pool-4356e54b-7df67db896-ntqpr 0/1 CrashLoopBackOff 502 (3m22s ago) 42h
      openshift-cnv hpp-pool-7dfd761c-cf499b659-9mdk7 1/1 Running 0 42h

      $ oc get pods hpp-pool-29ab9406-755647446d-d6rn7 -oyaml
      apiVersion: v1
      kind: Pod
      metadata:
      annotations:
      k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.128.2.5/23"],"mac_address":"0a:58:0a:80:02:05","gateway_ips":["10.128.2.1"],"ip_address":"10.128.2.5/23","gateway_ip":"10.128.2.1"}}'
      k8s.v1.cni.cncf.io/network-status: |-
      [{
      "name": "ovn-kubernetes",
      "interface": "eth0",
      "ips": [
      "10.128.2.5"
      ],
      "mac": "0a:58:0a:80:02:05",
      "default": true,
      "dns": {}
      }]
      openshift.io/scc: hostpath-provisioner-csi
      creationTimestamp: "2023-05-13T14:24:31Z"
      generateName: hpp-pool-29ab9406-755647446d-
      labels:
      hpp-pool: hpp-csi-pvc-block-hpp
      pod-template-hash: 755647446d
      name: hpp-pool-29ab9406-755647446d-d6rn7
      namespace: openshift-cnv
      ownerReferences:

      • apiVersion: apps/v1
        blockOwnerDeletion: true
        controller: true
        kind: ReplicaSet
        name: hpp-pool-29ab9406-755647446d
        uid: 6d6089af-1e72-4602-9f67-c212bcb1dac8
        resourceVersion: "22166040"
        uid: a5162c1e-babc-455e-a071-262b81d48c8a
        spec:
        affinity:
        nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
      • matchExpressions:
      • key: kubernetes.io/hostname
        operator: In
        values:
      • cnv-qe-infra-19.cnvqe2.lab.eng.rdu2.redhat.com
        containers:
      • command:
      • /usr/bin/mounter
      • --storagePoolPath
      • /dev/data
      • --mountPath
      • /var/hpp-csi-pvc-block/csi
      • --hostPath
      • /host
        image: registry.redhat.io/container-native-virtualization/hostpath-provisioner-operator-rhel9@sha256:045ad111f8d3fe28b8cf77df49a264922c9fa4cc46759ed98ef044077225a23e
        imagePullPolicy: IfNotPresent
        name: mounter
        resources:
        requests:
        cpu: 10m
        memory: 100Mi
        securityContext:
        capabilities:
        drop:
      • KILL
      • MKNOD
      • SETGID
      • SETUID
        privileged: true
        runAsUser: 0
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeDevices:
      • devicePath: /dev/data
        name: data
        volumeMounts:
      • mountPath: /host
        mountPropagation: Bidirectional
        name: host-root
      • mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: kube-api-access-ql72g
        readOnly: true
        dnsPolicy: ClusterFirst
        enableServiceLinks: true
        imagePullSecrets:
      • name: hostpath-provisioner-admin-csi-dockercfg-xn7tq
        nodeName: cnv-qe-infra-19.cnvqe2.lab.eng.rdu2.redhat.com
        preemptionPolicy: PreemptLowerPriority
        priority: 0
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: hostpath-provisioner-admin-csi
        serviceAccountName: hostpath-provisioner-admin-csi
        terminationGracePeriodSeconds: 30
        tolerations:
      • effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 300
      • effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 300
      • effect: NoSchedule
        key: node.kubernetes.io/memory-pressure
        operator: Exists
        volumes:
      • name: data
        persistentVolumeClaim:
        claimName: hpp-pool-29ab9406
      • hostPath:
        path: /
        type: Directory
        name: host-root
      • name: kube-api-access-ql72g
        projected:
        defaultMode: 420
        sources:
      • serviceAccountToken:
        expirationSeconds: 3607
        path: token
      • configMap:
        items:
      • key: ca.crt
        path: ca.crt
        name: kube-root-ca.crt
      • downwardAPI:
        items:
      • fieldRef:
        apiVersion: v1
        fieldPath: metadata.namespace
        path: namespace
      • configMap:
        items:
      • key: service-ca.crt
        path: service-ca.crt
        name: openshift-service-ca.crt
        status:
        conditions:
      • lastProbeTime: null
        lastTransitionTime: "2023-05-13T14:45:11Z"
        status: "True"
        type: Initialized
      • lastProbeTime: null
        lastTransitionTime: "2023-05-13T14:45:11Z"
        message: 'containers with unready status: [mounter]'
        reason: ContainersNotReady
        status: "False"
        type: Ready
      • lastProbeTime: null
        lastTransitionTime: "2023-05-13T14:45:11Z"
        message: 'containers with unready status: [mounter]'
        reason: ContainersNotReady
        status: "False"
        type: ContainersReady
      • lastProbeTime: null
        lastTransitionTime: "2023-05-13T14:45:09Z"
        status: "True"
        type: PodScheduled
        containerStatuses:
      • containerID: cri-o://5c71c577ce6c36921126314719346663f5cf9c072264d408d362bf45857219f9
        image: registry.redhat.io/container-native-virtualization/hostpath-provisioner-operator-rhel9@sha256:045ad111f8d3fe28b8cf77df49a264922c9fa4cc46759ed98ef044077225a23e
        imageID: registry.redhat.io/container-native-virtualization/hostpath-provisioner-operator-rhel9@sha256:045ad111f8d3fe28b8cf77df49a264922c9fa4cc46759ed98ef044077225a23e
        lastState:
        terminated:
        containerID: cri-o://5c71c577ce6c36921126314719346663f5cf9c072264d408d362bf45857219f9
        exitCode: 2
        finishedAt: "2023-05-15T08:29:59Z"
        reason: Error
        startedAt: "2023-05-15T08:29:59Z"
        name: mounter
        ready: false
        restartCount: 494
        started: false
        state:
        waiting:
        message: back-off 5m0s restarting failed container=mounter pod=hpp-pool-29ab9406-755647446d-d6rn7_openshift-cnv(a5162c1e-babc-455e-a071-262b81d48c8a)
        reason: CrashLoopBackOff
        hostIP: 10.1.156.19
        phase: Running
        podIP: 10.128.2.5
        podIPs:
      • ip: 10.128.2.5
        qosClass: Burstable
        startTime: "2023-05-13T14:45:11Z"

      Expected results:

      Additional info:
      W/A - force delete pvn+hpp-pool pods.

      Additional info from the cluster:
      $ oc get pods -n openshift-storage
      NAME READY STATUS RESTARTS AGE
      csi-addons-controller-manager-6976d48f69-fmpct 2/2 Running 9 (42h ago) 42h
      csi-cephfsplugin-7gqcc 2/2 Running 6 11d
      csi-cephfsplugin-pgg6z 2/2 Running 4 11d
      csi-cephfsplugin-provisioner-cc76c4b9-vmpk6 5/5 Running 0 42h
      csi-cephfsplugin-provisioner-cc76c4b9-xp9rt 5/5 Running 0 42h
      csi-cephfsplugin-q4r8n 2/2 Running 4 11d
      csi-rbdplugin-j8465 3/3 Running 9 11d
      csi-rbdplugin-jl4jf 3/3 Running 6 11d
      csi-rbdplugin-provisioner-8558756f4f-fvtb2 6/6 Running 0 42h
      csi-rbdplugin-provisioner-8558756f4f-kxgpp 6/6 Running 0 42h
      csi-rbdplugin-wgjml 3/3 Running 6 11d
      noobaa-operator-645c48c4c5-6gx4w 1/1 Running 0 42h
      ocs-metrics-exporter-774f4b58cc-5ngc5 1/1 Running 0 42h
      ocs-operator-5b5d98d58d-zl7zq 1/1 Running 11 (41h ago) 42h
      odf-console-78bb5b66-4mnfb 1/1 Running 0 42h
      odf-operator-controller-manager-7db8d4fd4c-ltzkd 2/2 Running 0 42h
      rook-ceph-crashcollector-03d7e1289c5164e19d0d22d6856ffdae-9b4nt 1/1 Running 0 42h
      rook-ceph-crashcollector-374253a427dc62aef82d81f5fc14643e-44bqw 1/1 Running 0 42h
      rook-ceph-crashcollector-c903e190df41042ede88f92c4aa10277-n5jbj 1/1 Running 0 42h
      rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-666b46d6k42f8 2/2 Running 0 42h
      rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-84bb79d6hz5dp 2/2 Running 0 42h
      rook-ceph-mgr-a-7fd8968d84-p2sx4 2/2 Running 0 42h
      rook-ceph-mon-d-54b48b9549-rf69w 2/2 Running 0 42h
      rook-ceph-mon-e-cc8d486-94tff 2/2 Running 0 42h
      rook-ceph-mon-g-66d7d99bd7-44gjd 2/2 Running 0 42h
      rook-ceph-operator-5b595585d7-kpnsd 1/1 Running 8 (42h ago) 42h
      rook-ceph-osd-0-7987b8c66c-89rws 2/2 Running 0 42h
      rook-ceph-osd-1-7956cc5998-6ghk2 2/2 Running 0 42h
      rook-ceph-osd-2-6f6cfb658f-kdcmp 2/2 Running 0 42h

      $ oc get pods -A | grep hostpath
      openshift-cnv hostpath-provisioner-csi-lzvq6 4/4 Running 4 5d1h
      openshift-cnv hostpath-provisioner-csi-s69jh 4/4 Running 8 5d1h
      openshift-cnv hostpath-provisioner-csi-td8hj 4/4 Running 4 5d1h
      openshift-cnv hostpath-provisioner-operator-77f6f799d5-5dtlz 1/1 Running 1 (42h ago) 42h

      $ oc get pods -A | grep hpp
      openshift-cnv hpp-pool-29ab9406-755647446d-44jfk 0/1 Terminating 10 43h
      openshift-cnv hpp-pool-29ab9406-755647446d-d6rn7 0/1 CrashLoopBackOff 497 (4m5s ago) 42h
      openshift-cnv hpp-pool-4356e54b-7df67db896-8vq5t 0/1 Terminating 3 43h
      openshift-cnv hpp-pool-4356e54b-7df67db896-ntqpr 0/1 CrashLoopBackOff 502 (3m22s ago) 42h
      openshift-cnv hpp-pool-7dfd761c-cf499b659-9mdk7 1/1 Running 0 42h

              rhn-support-awels Alexander Wels
              rh-ee-awax Anat Wax
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: