-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.18, 4.19, 4.20, 4.21.0
Description of problem:
In the "node-exporters" pods are full of messages where the collector is not able to read the InfiniBand counters:
2025-09-30T17:07:12.432674275+05:30 ts=2025-09-30T11:37:12.432Z caller=collector.go:169 level=error msg="collector failed" name=infiniband duration_seconds=0.006340497 err="error obtaining InfiniBand class info: failed to read file \"/host/sys/class/infiniband/qedr0/ports/1/counters/VL15_dropped\": invalid argument"
This error is already fixed in upstream in https://github.com/prometheus/node_exporter/issues/3265.
Version-Release number of selected component (if applicable):
Openshift 4.18.22
How reproducible:
Always
Steps to Reproduce:
- Deploy Openshift running on top of a hardware with Infiniband
- Review the "node-exporter" logs
Actual results:
- Not counter metrics for Infiniband
- for each node-exporter a high number of logs produced with the error exist. In ~40 minutes around 358 per "node-exporter" pod
$ oc logs node-exporter-c8jxr -c node-exporter -n openshift-monitoring|head -1 2025-09-30T16:22:57.923447493+05:30 ts=2025-09-30T10:52:57+00:00 num_cpus=96 gomaxprocs=4 $ oc logs node-exporter-c8jxr -c node-exporter -n openshift-monitoring |tail -1 2025-09-30T17:07:42.429386262+05:30 ts=2025-09-30T11:37:42.429Z caller=collector.go:169 level=error msg="collector failed" name=infiniband duration_seconds=0.000256877 err="error obtaining InfiniBand class info: failed to read file \"/host/sys/class/infiniband/qedr0/ports/1/counters/VL15_dropped\": invalid argument" $ oc logs node-exporter-c8jxr -c node-exporter -n openshift-monitoring|grep -c "error obtaining InfiniBand" 358 $ for pod in $(oc get pods -l app.kubernetes.io/name=node-exporter -o name); do oc logs $pod -c node-exporter ; done|grep -c "error obtaining InfiniBand" 2148
Expected results:
- Infiniband counter are collected as metrics
- Not noisy errors in the "node-exporter" pods
Additional info:
- blocks
-
OCPBUGS-63624 node-exporter throws errors reading the InfiniBand class counter info
-
- New
-
- is cloned by
-
OCPBUGS-63624 node-exporter throws errors reading the InfiniBand class counter info
-
- New
-
- links to