Loading...

Type: Bug
Resolution: Not a Bug
Priority: Major
Fix Version/s: CNV v4.13.5
Affects Version/s: None
Component/s: Storage Platform
Labels:
- cnv-4?
- cnvbugsm
- devel_ack+
- pm_ack+
- qa_ack?

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
BZ Status:
CLOSED
BZ URL:
https://bugzilla.redhat.com/show_bug.cgi?id=2203786
Bugzilla Bug:
RHBZ: 2203786
Intelligence Requested:
Market:

Severity:
Important

Regression:
No

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:
On BM's (this one seen on bm03-cnvqe2-rdu2), hpp-pool pod gets stuck in a CrashLoopBackOff state.
Seems like it is ceph related. HPP is backed by OCS.

Version-Release number of selected component (if applicable):
$ oc get csv -A | grep kubevirt
openshift-cnv kubevirt-hyperconverged-operator.v4.13.0 OpenShift Virtualization 4.13.0 kubevirt-hyperconverged-operator.v4.12.3 Succeeded
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.13.0-rc.5 True False 11d Cluster version is 4.13.0-rc.5

How reproducible:
I'm not sure what's triggering this issue. After running the network team test suite, which creates network components and VMs, on some BM's the hpp-pool gets into this state.

Steps to Reproduce:
1.
2.
3.

Actual results:
Hpp-pool pod gets stuck in a CrashLoopBackOff state.
$ oc get pods -n openshift-cnv | grep hpp
openshift-cnv hpp-pool-29ab9406-755647446d-44jfk 0/1 Terminating 10 43h
openshift-cnv hpp-pool-29ab9406-755647446d-d6rn7 0/1 CrashLoopBackOff 497 (4m5s ago) 42h
openshift-cnv hpp-pool-4356e54b-7df67db896-8vq5t 0/1 Terminating 3 43h
openshift-cnv hpp-pool-4356e54b-7df67db896-ntqpr 0/1 CrashLoopBackOff 502 (3m22s ago) 42h
openshift-cnv hpp-pool-7dfd761c-cf499b659-9mdk7 1/1 Running 0 42h

$ oc get pods hpp-pool-29ab9406-755647446d-d6rn7 -oyaml
apiVersion: v1
kind: Pod
metadata:
annotations:
k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.128.2.5/23"],"mac_address":"0a:58:0a:80:02:05","gateway_ips":["10.128.2.1"],"ip_address":"10.128.2.5/23","gateway_ip":"10.128.2.1"}}'
k8s.v1.cni.cncf.io/network-status: |-
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"10.128.2.5"
],
"mac": "0a:58:0a:80:02:05",
"default": true,
"dns": {}
}]
openshift.io/scc: hostpath-provisioner-csi
creationTimestamp: "2023-05-13T14:24:31Z"
generateName: hpp-pool-29ab9406-755647446d-
labels:
hpp-pool: hpp-csi-pvc-block-hpp
pod-template-hash: 755647446d
name: hpp-pool-29ab9406-755647446d-d6rn7
namespace: openshift-cnv
ownerReferences:

apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: hpp-pool-29ab9406-755647446d
uid: 6d6089af-1e72-4602-9f67-c212bcb1dac8
resourceVersion: "22166040"
uid: a5162c1e-babc-455e-a071-262b81d48c8a
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
matchExpressions:
key: kubernetes.io/hostname
operator: In
values:
cnv-qe-infra-19.cnvqe2.lab.eng.rdu2.redhat.com
containers:
command:
/usr/bin/mounter
--storagePoolPath
/dev/data
--mountPath
/var/hpp-csi-pvc-block/csi
--hostPath
/host
image: registry.redhat.io/container-native-virtualization/hostpath-provisioner-operator-rhel9@sha256:045ad111f8d3fe28b8cf77df49a264922c9fa4cc46759ed98ef044077225a23e
imagePullPolicy: IfNotPresent
name: mounter
resources:
requests:
cpu: 10m
memory: 100Mi
securityContext:
capabilities:
drop:
KILL
MKNOD
SETGID
SETUID
privileged: true
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeDevices:
devicePath: /dev/data
name: data
volumeMounts:
mountPath: /host
mountPropagation: Bidirectional
name: host-root
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-ql72g
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
imagePullSecrets:
name: hostpath-provisioner-admin-csi-dockercfg-xn7tq
nodeName: cnv-qe-infra-19.cnvqe2.lab.eng.rdu2.redhat.com
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: hostpath-provisioner-admin-csi
serviceAccountName: hostpath-provisioner-admin-csi
terminationGracePeriodSeconds: 30
tolerations:
effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
volumes:
name: data
persistentVolumeClaim:
claimName: hpp-pool-29ab9406
hostPath:
path: /
type: Directory
name: host-root
name: kube-api-access-ql72g
projected:
defaultMode: 420
sources:
serviceAccountToken:
expirationSeconds: 3607
path: token
configMap:
items:
key: ca.crt
path: ca.crt
name: kube-root-ca.crt
downwardAPI:
items:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
configMap:
items:
key: service-ca.crt
path: service-ca.crt
name: openshift-service-ca.crt
status:
conditions:
lastProbeTime: null
lastTransitionTime: "2023-05-13T14:45:11Z"
status: "True"
type: Initialized
lastProbeTime: null
lastTransitionTime: "2023-05-13T14:45:11Z"
message: 'containers with unready status: [mounter]'
reason: ContainersNotReady
status: "False"
type: Ready
lastProbeTime: null
lastTransitionTime: "2023-05-13T14:45:11Z"
message: 'containers with unready status: [mounter]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
lastProbeTime: null
lastTransitionTime: "2023-05-13T14:45:09Z"
status: "True"
type: PodScheduled
containerStatuses:
containerID: cri-o://5c71c577ce6c36921126314719346663f5cf9c072264d408d362bf45857219f9
image: registry.redhat.io/container-native-virtualization/hostpath-provisioner-operator-rhel9@sha256:045ad111f8d3fe28b8cf77df49a264922c9fa4cc46759ed98ef044077225a23e
imageID: registry.redhat.io/container-native-virtualization/hostpath-provisioner-operator-rhel9@sha256:045ad111f8d3fe28b8cf77df49a264922c9fa4cc46759ed98ef044077225a23e
lastState:
terminated:
containerID: cri-o://5c71c577ce6c36921126314719346663f5cf9c072264d408d362bf45857219f9
exitCode: 2
finishedAt: "2023-05-15T08:29:59Z"
reason: Error
startedAt: "2023-05-15T08:29:59Z"
name: mounter
ready: false
restartCount: 494
started: false
state:
waiting:
message: back-off 5m0s restarting failed container=mounter pod=hpp-pool-29ab9406-755647446d-d6rn7_openshift-cnv(a5162c1e-babc-455e-a071-262b81d48c8a)
reason: CrashLoopBackOff
hostIP: 10.1.156.19
phase: Running
podIP: 10.128.2.5
podIPs:
ip: 10.128.2.5
qosClass: Burstable
startTime: "2023-05-13T14:45:11Z"

Expected results:

Additional info:
W/A - force delete pvn+hpp-pool pods.

Additional info from the cluster:
$ oc get pods -n openshift-storage
NAME READY STATUS RESTARTS AGE
csi-addons-controller-manager-6976d48f69-fmpct 2/2 Running 9 (42h ago) 42h
csi-cephfsplugin-7gqcc 2/2 Running 6 11d
csi-cephfsplugin-pgg6z 2/2 Running 4 11d
csi-cephfsplugin-provisioner-cc76c4b9-vmpk6 5/5 Running 0 42h
csi-cephfsplugin-provisioner-cc76c4b9-xp9rt 5/5 Running 0 42h
csi-cephfsplugin-q4r8n 2/2 Running 4 11d
csi-rbdplugin-j8465 3/3 Running 9 11d
csi-rbdplugin-jl4jf 3/3 Running 6 11d
csi-rbdplugin-provisioner-8558756f4f-fvtb2 6/6 Running 0 42h
csi-rbdplugin-provisioner-8558756f4f-kxgpp 6/6 Running 0 42h
csi-rbdplugin-wgjml 3/3 Running 6 11d
noobaa-operator-645c48c4c5-6gx4w 1/1 Running 0 42h
ocs-metrics-exporter-774f4b58cc-5ngc5 1/1 Running 0 42h
ocs-operator-5b5d98d58d-zl7zq 1/1 Running 11 (41h ago) 42h
odf-console-78bb5b66-4mnfb 1/1 Running 0 42h
odf-operator-controller-manager-7db8d4fd4c-ltzkd 2/2 Running 0 42h
rook-ceph-crashcollector-03d7e1289c5164e19d0d22d6856ffdae-9b4nt 1/1 Running 0 42h
rook-ceph-crashcollector-374253a427dc62aef82d81f5fc14643e-44bqw 1/1 Running 0 42h
rook-ceph-crashcollector-c903e190df41042ede88f92c4aa10277-n5jbj 1/1 Running 0 42h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-666b46d6k42f8 2/2 Running 0 42h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-84bb79d6hz5dp 2/2 Running 0 42h
rook-ceph-mgr-a-7fd8968d84-p2sx4 2/2 Running 0 42h
rook-ceph-mon-d-54b48b9549-rf69w 2/2 Running 0 42h
rook-ceph-mon-e-cc8d486-94tff 2/2 Running 0 42h
rook-ceph-mon-g-66d7d99bd7-44gjd 2/2 Running 0 42h
rook-ceph-operator-5b595585d7-kpnsd 1/1 Running 8 (42h ago) 42h
rook-ceph-osd-0-7987b8c66c-89rws 2/2 Running 0 42h
rook-ceph-osd-1-7956cc5998-6ghk2 2/2 Running 0 42h
rook-ceph-osd-2-6f6cfb658f-kdcmp 2/2 Running 0 42h

$ oc get pods -A | grep hostpath
openshift-cnv hostpath-provisioner-csi-lzvq6 4/4 Running 4 5d1h
openshift-cnv hostpath-provisioner-csi-s69jh 4/4 Running 8 5d1h
openshift-cnv hostpath-provisioner-csi-td8hj 4/4 Running 4 5d1h
openshift-cnv hostpath-provisioner-operator-77f6f799d5-5dtlz 1/1 Running 1 (42h ago) 42h

$ oc get pods -A | grep hpp
openshift-cnv hpp-pool-29ab9406-755647446d-44jfk 0/1 Terminating 10 43h
openshift-cnv hpp-pool-29ab9406-755647446d-d6rn7 0/1 CrashLoopBackOff 497 (4m5s ago) 42h
openshift-cnv hpp-pool-4356e54b-7df67db896-8vq5t 0/1 Terminating 3 43h
openshift-cnv hpp-pool-4356e54b-7df67db896-ntqpr 0/1 CrashLoopBackOff 502 (3m22s ago) 42h
openshift-cnv hpp-pool-7dfd761c-cf499b659-9mdk7 1/1 Running 0 42h

external trackers

PnT-DevOps Jira CNV-28696

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates