-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.12.z
-
Critical
-
No
-
False
-
-
Description of problem:
After installing a SNO node with 4.12.54 with Telco DU profile applied and running our cpu utlization tests, the node becomes unresponsive and times out for a very log time (both oc commands and ssh access). The workload test starts a couple of oslat pods and a number of stress-ng pods for the workload using a portion of the isolated cpus on the BM. I was only able to collect logs, must-gather, etc. after leaving the node overnight with the workload pods still running . Then oc/ssh commands because responsive again and I was able to collect debugs and logs. I will attach must-gather, sosreport, etc. in a comment. Please let me know if there are any other logs of interest that I should capture. Symptoms are similar to OCPBUGS-30096 This issue seems to be specific to systems running with an Ice Lake processor. SPR-EE doesn't seem to have this problem
Version-Release number of selected component (if applicable):
OCP Version at Install Time: 4.12.54 RHCOS Version at Install Time: $ cat /etc/redhat-release Red Hat Enterprise Linux CoreOS release 4.12 Platform Baremetal: Dell PowerEdge R750 $ uname -r 4.18.0-372.98.1.rt7.258.el8_6.x86_64 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 112 On-line CPU(s) list: 0-111 Thread(s) per core: 2 Core(s) per socket: 28 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 106 Model name: Intel(R) Xeon(R) Gold 6330N CPU @ 2.20GHz Stepping: 6 CPU MHz: 2200.000 BogoMIPS: 4400.00 Virtualization: VT-x L1d cache: 48K L1i cache: 32K L2 cache: 1280K L3 cache: 43008K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect wbnoinvd dtherm ida arat pln pts avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid fsrm md_clear pconfig flush_l1d arch_capabilities
How reproducible:
Always.
Steps to Reproduce:
1. Install SNO node with Telco DU profile applied. 2. Run RAN QE cpu_util=1h tests 3. Observe test results
Actual results:
test starts workload but then node because unresponsive and times out all commands os the test suite fails.
Expected results:
Test should start workload and run successfully
Additional info:
pod list. Ignore installer pods. $ oc get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE assisted-installer assisted-installer-controller-b6hwd 0/1 Completed 0 3d17h cnfgotestpriv cnfgotestpriv-helix54.lab.eng.rdu2.redhat.com 1/1 Running 0 23h open-cluster-management-agent-addon config-policy-controller-5585595995-dx8zb 2/2 Running 5 (10h ago) 3d17h open-cluster-management-agent-addon governance-policy-framework-7c7cc97bd9-rdnr9 2/2 Running 9 (10h ago) 3d17h open-cluster-management-agent-addon klusterlet-addon-workmgr-58cd697bf7-8gszm 1/1 Running 1 3d17h open-cluster-management-agent klusterlet-568fdc4c67-6b5zg 1/1 Running 1 3d17h open-cluster-management-agent klusterlet-agent-7dc4df9b4c-5p287 1/1 Running 1 3d17h openshift-apiserver-operator openshift-apiserver-operator-7c45658f79-c59kb 1/1 Running 8 (10h ago) 3d17h openshift-apiserver apiserver-85dd9d5cd9-tdxnb 2/2 Running 0 3d16h openshift-authentication-operator authentication-operator-7b4ccd6d4b-whpvh 1/1 Running 11 (10h ago) 3d17h openshift-authentication oauth-openshift-6744ff7599-8r4gq 1/1 Running 0 3d15h openshift-cloud-controller-manager-operator cluster-cloud-controller-manager-operator-5c9cb8b9dd-nz26t 2/2 Running 6 (22h ago) 3d17h openshift-cloud-credential-operator cloud-credential-operator-b965dfbf4-8nthf 2/2 Running 3 (23h ago) 3d17h openshift-cluster-machine-approver machine-approver-cf9c8f66-bgqsg 2/2 Running 10 (10h ago) 3d17h openshift-cluster-node-tuning-operator cluster-node-tuning-operator-78dd9b5bd8-cl57w 1/1 Running 4 (22h ago) 3d17h openshift-cluster-node-tuning-operator tuned-gflmd 1/1 Running 1 3d17h openshift-cluster-samples-operator cluster-samples-operator-66cd94ff9c-wpbrz 2/2 Running 2 3d17h openshift-cluster-storage-operator cluster-storage-operator-8499666589-96qss 1/1 Running 9 (10h ago) 3d17h openshift-cluster-storage-operator csi-snapshot-controller-5db5859857-g7wxx 1/1 Running 8 (10h ago) 3d17h openshift-cluster-storage-operator csi-snapshot-controller-operator-7cd774c84b-bnfj2 1/1 Running 8 (10h ago) 3d17h openshift-cluster-storage-operator csi-snapshot-webhook-7bb787f596-xfmjx 1/1 Running 1 3d17h openshift-cluster-version cluster-version-operator-56799c8976-9zk78 1/1 Running 6 (10h ago) 3d17h openshift-config-operator openshift-config-operator-9f8dd6978-9lltf 1/1 Running 7 (22h ago) 3d17h openshift-console-operator console-operator-55dbd86564-7h44r 2/2 Running 13 (10h ago) 3d17h openshift-controller-manager-operator openshift-controller-manager-operator-664bcdfbbc-w5hwg 1/1 Running 5 (10h ago) 3d17h openshift-controller-manager controller-manager-75d5dd8cd4-b2npn 1/1 Running 2 (22h ago) 2d18h openshift-dns-operator dns-operator-7f86f6f997-gfl7g 2/2 Running 2 3d17h openshift-dns dns-default-492ck 2/2 Running 2 3d17h openshift-dns node-resolver-vjj6g 1/1 Running 1 3d17h openshift-etcd-operator etcd-operator-ccd6ccf9-b6cxn 1/1 Running 5 (10h ago) 3d17h openshift-etcd etcd-helix54.lab.eng.rdu2.redhat.com 4/4 Running 5 (23h ago) 3d17h openshift-etcd installer-2-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 3d17h openshift-etcd installer-3-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 3d17h openshift-image-registry cluster-image-registry-operator-7f4d548656-hlvw7 1/1 Running 3 (22h ago) 3d17h openshift-image-registry image-pruner-28529280-ddw6q 0/1 Completed 0 3d15h openshift-image-registry image-pruner-28530720-vtfg4 0/1 Completed 0 2d15h openshift-image-registry image-pruner-28532160-rxccq 0/1 Completed 0 39h openshift-image-registry node-ca-x9l7n 1/1 Running 1 3d17h openshift-ingress-canary ingress-canary-k7lw7 1/1 Running 1 3d17h openshift-ingress-operator ingress-operator-688565ff4c-wghc9 2/2 Running 8 (3d17h ago) 3d17h openshift-ingress router-default-7fdd957b77-854tl 1/1 Running 13 (22h ago) 3d17h openshift-insights insights-operator-85bd6b9d75-dbhlr 1/1 Running 2 3d17h openshift-kube-apiserver-operator kube-apiserver-operator-db9d8b854-5cnhx 1/1 Running 8 (10h ago) 3d17h openshift-kube-apiserver installer-10-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 2d18h openshift-kube-apiserver installer-2-helix54.lab.eng.rdu2.redhat.com 0/1 Error 0 3d17h openshift-kube-apiserver installer-3-helix54.lab.eng.rdu2.redhat.com 0/1 Error 0 3d17h openshift-kube-apiserver installer-3-retry-1-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 3d17h openshift-kube-apiserver installer-4-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 3d17h openshift-kube-apiserver installer-5-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 3d17h openshift-kube-apiserver installer-6-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 2d23h openshift-kube-apiserver installer-7-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 2d22h openshift-kube-apiserver installer-8-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 2d22h openshift-kube-apiserver installer-9-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 2d19h openshift-kube-apiserver kube-apiserver-helix54.lab.eng.rdu2.redhat.com 5/5 Running 4 (22h ago) 2d18h openshift-kube-controller-manager-operator kube-controller-manager-operator-6ffc759b87-lwfsk 1/1 Running 8 (10h ago) 3d17h openshift-kube-controller-manager installer-4-helix54.lab.eng.rdu2.redhat.com 0/1 Error 0 3d17h openshift-kube-controller-manager installer-4-retry-1-helix54.lab.eng.rdu2.redhat.com 0/1 Error 0 3d17h openshift-kube-controller-manager installer-4-retry-2-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 3d17h openshift-kube-controller-manager installer-5-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 3d17h openshift-kube-controller-manager installer-6-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 3d17h openshift-kube-controller-manager kube-controller-manager-helix54.lab.eng.rdu2.redhat.com 4/4 Running 29 (10h ago) 3d17h openshift-kube-scheduler-operator openshift-kube-scheduler-operator-6fcd67464b-ss2rb 1/1 Running 4 (22h ago) 3d17h openshift-kube-scheduler installer-6-helix54.lab.eng.rdu2.redhat.com 0/1 Error 0 3d17h openshift-kube-scheduler installer-6-retry-1-helix54.lab.eng.rdu2.redhat.com 0/1 Error 0 3d17h openshift-kube-scheduler installer-6-retry-2-helix54.lab.eng.rdu2.redhat.com 0/1 Error 0 3d17h openshift-kube-scheduler installer-8-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 3d17h openshift-kube-scheduler openshift-kube-scheduler-helix54.lab.eng.rdu2.redhat.com 3/3 Running 8 (22h ago) 3d17h openshift-kube-scheduler revision-pruner-6-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 3d17h openshift-kube-scheduler revision-pruner-7-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 3d17h openshift-kube-scheduler revision-pruner-8-helix54.lab.eng.rdu2.redhat.com 0/1 Completed 0 3d17h openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-696f5b4cd6-qb6d8 1/1 Running 5 (10h ago) 3d17h openshift-kube-storage-version-migrator migrator-6cd55974d-bjr5f 1/1 Running 1 3d17h openshift-local-storage local-storage-operator-b77d48486-48p6n 1/1 Running 6 (22h ago) 3d17h openshift-logging cluster-logging-operator-5d6fd786b7-ns4l2 1/1 Running 1 3d17h openshift-logging collector-sg7n9 1/1 Running 7 (22h ago) 3d17h openshift-machine-api cluster-autoscaler-operator-7b56cd79c-n2h5n 2/2 Running 8 (10h ago) 3d17h openshift-machine-api cluster-baremetal-operator-857f8d49bb-xbg8d 2/2 Running 2 3d17h openshift-machine-api control-plane-machine-set-operator-7bf8bcb697-t49jl 1/1 Running 2 (23h ago) 3d17h openshift-machine-api machine-api-operator-5486d647f4-47x28 2/2 Running 4 (22h ago) 3d17h openshift-machine-config-operator machine-config-controller-74b59b444f-2x2vf 2/2 Running 4 (22h ago) 3d17h openshift-machine-config-operator machine-config-daemon-sps7z 2/2 Running 2 3d17h openshift-machine-config-operator machine-config-operator-5686796db8-d4dtb 1/1 Running 3 (22h ago) 3d17h openshift-machine-config-operator machine-config-server-57kz4 1/1 Running 1 3d17h openshift-marketplace 621c0e1d7f8078b3bee1cc563a6f1f15b74a1cf1881c75bc5224b0669194pkb 0/1 Completed 0 3d17h openshift-marketplace 92dd0af2b8de68c8bc8590413242dc0d5ae3d2644a4a89c974ce876197prmxh 0/1 Completed 0 3d17h openshift-marketplace a16a3cc0b92b93eb191ec6e64d8e873251ccaaab775e0dbe4bd3fa316antn7q 0/1 Completed 0 3d17h openshift-marketplace ca4446e0495d302207c932e410bce8fcff613d5e76c15bc2e106278ff4prnhp 0/1 Completed 0 3d17h openshift-marketplace certified-operators-custom-hg89v 1/1 Running 4 (22h ago) 3d17h openshift-marketplace d9906a460566ac6f16a2bde80af6a855ec4879fe3a49cfbaf9d997e02awk5pq 0/1 Completed 0 3d17h openshift-marketplace marketplace-operator-6d958c7db5-dg4bh 1/1 Running 13 (10h ago) 3d17h openshift-marketplace redhat-operators-custom-8k7r8 1/1 Running 4 (22h ago) 3d17h openshift-monitoring cluster-monitoring-operator-6c5d6757c-ntnft 2/2 Running 2 3d17h openshift-monitoring kube-state-metrics-57568d4d44-w4qg7 3/3 Running 3 3d17h openshift-monitoring node-exporter-qfpj7 2/2 Running 2 3d17h openshift-monitoring openshift-state-metrics-779888777d-59npk 3/3 Running 3 3d17h openshift-monitoring process-exporter-mt8w4 1/1 Running 0 23h openshift-monitoring prometheus-adapter-c8bdb89bf-jzrbz 1/1 Running 0 2d17h openshift-monitoring prometheus-k8s-0 6/6 Running 8 (22h ago) 3d17h openshift-monitoring prometheus-operator-9655df9d4-8qxlk 2/2 Running 2 3d17h openshift-monitoring prometheus-operator-admission-webhook-646d865769-dr955 1/1 Running 1 3d17h openshift-monitoring thanos-querier-5b79c9dc95-zttbj 6/6 Running 6 3d17h openshift-multus multus-additional-cni-plugins-6z6fg 1/1 Running 1 3d17h openshift-multus multus-admission-controller-7bcf5b4ccc-4k2gr 2/2 Running 0 3d17h openshift-multus multus-xr455 1/1 Running 1 3d17h openshift-multus network-metrics-daemon-x7x6z 2/2 Running 2 3d17h openshift-network-operator network-operator-7d88cb796b-dxvz9 1/1 Running 5 (10h ago) 3d17h openshift-oauth-apiserver apiserver-5c55b9d6b6-d9fhl 1/1 Running 3 (22h ago) 3d17h openshift-operator-lifecycle-manager catalog-operator-659b567fbc-8gg6v 1/1 Running 1 3d17h openshift-operator-lifecycle-manager collect-profiles-28534500-lns57 0/1 Completed 0 44m openshift-operator-lifecycle-manager collect-profiles-28534515-nrsht 0/1 Completed 0 29m openshift-operator-lifecycle-manager collect-profiles-28534530-ts9p7 0/1 Completed 0 14m openshift-operator-lifecycle-manager olm-operator-595c6c9b88-twztv 1/1 Running 1 3d17h openshift-operator-lifecycle-manager package-server-manager-84c5c69748-m882t 1/1 Running 3 (22h ago) 3d17h openshift-operator-lifecycle-manager packageserver-86dfb5c79d-tng78 1/1 Running 1 3d17h openshift-ovn-kubernetes ovnkube-master-ltrlp 6/6 Running 8 (22h ago) 3d17h openshift-ovn-kubernetes ovnkube-node-s6q5n 5/5 Running 5 3d17h openshift-ptp linuxptp-daemon-t7x52 2/2 Running 2 3d17h openshift-ptp ptp-operator-795878776c-56cmr 1/1 Running 2 (23h ago) 3d17h openshift-route-controller-manager route-controller-manager-9646cb6b8-cswkt 1/1 Running 2 (22h ago) 2d18h openshift-service-ca-operator service-ca-operator-54896f9bd8-j25dw 1/1 Running 5 (10h ago) 3d17h openshift-service-ca service-ca-79864d7b86-x4mv6 1/1 Running 7 (10h ago) 3d17h openshift-sriov-network-operator network-resources-injector-tnwjl 1/1 Running 1 3d17h openshift-sriov-network-operator operator-webhook-7fp7j 1/1 Running 1 3d17h openshift-sriov-network-operator sriov-device-plugin-6kjh8 1/1 Running 0 3d16h openshift-sriov-network-operator sriov-network-config-daemon-hvn9l 3/3 Running 3 3d17h openshift-sriov-network-operator sriov-network-operator-784796b954-cvrjw 1/1 Running 3 (22h ago) 3d17h ran-test oslat-m5d5b 1/1 Running 0 23h ran-test oslat-nj2mx 1/1 Running 0 23h ran-test stress-ng-2c5dr 1/1 Running 1 23h ran-test stress-ng-2vnp7 1/1 Running 0 23h ran-test stress-ng-4hzkp 1/1 Running 0 23h ran-test stress-ng-4mrtv 1/1 Running 0 23h ran-test stress-ng-5594d 1/1 Running 3 23h ran-test stress-ng-5lq4s 1/1 Running 0 23h ran-test stress-ng-6dn4c 1/1 Running 0 23h ran-test stress-ng-6wph9 1/1 Running 0 23h ran-test stress-ng-6xvdv 1/1 Running 0 23h ran-test stress-ng-7qz5l 1/1 Running 0 23h ran-test stress-ng-7zswm 1/1 Running 1 23h ran-test stress-ng-8m8zv 1/1 Running 0 23h ran-test stress-ng-cm7xj 1/1 Running 0 23h ran-test stress-ng-gqf8s 1/1 Running 1 23h ran-test stress-ng-jc5q2 1/1 Running 0 23h ran-test stress-ng-jhhlx 1/1 Running 0 23h ran-test stress-ng-jlrrp 1/1 Running 0 23h ran-test stress-ng-jm28n 1/1 Running 4 23h ran-test stress-ng-lmmx5 1/1 Running 4 23h ran-test stress-ng-lzn4n 1/1 Running 0 23h ran-test stress-ng-ndksx 1/1 Running 0 23h ran-test stress-ng-nw4wl 1/1 Running 0 23h ran-test stress-ng-nzp46 1/1 Running 1 23h ran-test stress-ng-s4wc2 1/1 Running 0 23h ran-test stress-ng-s87f6 1/1 Running 0 23h ran-test stress-ng-vb7pc 1/1 Running 0 23h ran-test stress-ng-wbxpb 1/1 Running 0 23h ran-test stress-ng-wnrt4 1/1 Running 0 23h vran-acceleration-operators accelerator-discovery-vvwhg 1/1 Running 1 3d17h vran-acceleration-operators sriov-device-plugin-9kf6c 1/1 Running 1 3d17h vran-acceleration-operators sriov-fec-controller-manager-64f8898c7c-6brz4 2/2 Running 6 (22h ago) 3d17h vran-acceleration-operators sriov-fec-daemonset-fczmx 1/1 Running 1 3d17h
- relates to
-
OCPBUGS-30096 [4.12][Tracker for RHEL-26706] High Load and Pods Stuck Terminating
- Closed