-
Bug
-
Resolution: Duplicate
-
Normal
-
None
-
4.19.0
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
when I reinstall 4.19.0-rc.3 on my 3 node bare metal host cluster, there is always one or two tuned pods fail to start as shown in below : # oc get clusteroperator NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.19.0-rc.3 True False False 12m baremetal 4.19.0-rc.3 True False False 31m cloud-controller-manager 4.19.0-rc.3 True False False 33m cloud-credential 4.19.0-rc.3 True False False 40m cluster-autoscaler 4.19.0-rc.3 True False False 31m config-operator 4.19.0-rc.3 True False False 32m console 4.19.0-rc.3 True False False 17m control-plane-machine-set 4.19.0-rc.3 True False False 31m csi-snapshot-controller 4.19.0-rc.3 True False False 31m dns 4.19.0-rc.3 True False False 31m etcd 4.19.0-rc.3 True False False 30m image-registry 4.19.0-rc.3 True False False 18m ingress 4.19.0-rc.3 True False False 21m insights 4.19.0-rc.3 True False False 31m kube-apiserver 4.19.0-rc.3 True False False 27m kube-controller-manager 4.19.0-rc.3 True False False 27m kube-scheduler 4.19.0-rc.3 True False False 29m kube-storage-version-migrator 4.19.0-rc.3 True False False 32m machine-api 4.19.0-rc.3 True False False 28m machine-approver 4.19.0-rc.3 True False False 32m machine-config 4.19.0-rc.3 True False False 30m marketplace 4.19.0-rc.3 True False False 31m monitoring 4.19.0-rc.3 True False False 13m network 4.19.0-rc.3 True False False 32m node-tuning 4.19.0-rc.3 True True False 14m Waiting for 1/3 Profiles to be applied olm 4.19.0-rc.3 True False False 31m openshift-apiserver 4.19.0-rc.3 True False False 22m openshift-controller-manager 4.19.0-rc.3 True False False 27m openshift-samples 4.19.0-rc.3 True False False 21m operator-lifecycle-manager 4.19.0-rc.3 True False False 31m operator-lifecycle-manager-catalog 4.19.0-rc.3 True False False 31m operator-lifecycle-manager-packageserver 4.19.0-rc.3 True False False 22m service-ca 4.19.0-rc.3 True False False 32m storage 4.19.0-rc.3 True False False 32m
here is the pod log from failed pod: I0528 00:27:28.613490 10977 controller.go:1667] starting in-cluster ocp-tuned v4.19.0-202505140813.p0.g7172669.assembly.stream.el9-0-g80d720b-dirty I0528 00:27:28.658393 10977 controller.go:671] writing /var/lib/ocp-tuned/image.env E0528 00:27:28.661083 10977 controller.go:1712] error repacking the profile: open /etc/tuned/recommend.d/50-openshift.conf: no such file or directory I0528 00:27:28.661099 10977 controller.go:1713] deferred updates likely broken I0528 00:27:28.661106 10977 controller.go:1729] starting: profile unpacked is "" fingerprint "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" I0528 00:27:28.661117 10977 controller.go:1425] recover: no pending deferred change I0528 00:27:28.661123 10977 controller.go:1735] starting: no pending deferred update I0528 00:27:28.669889 10977 controller.go:382] disabling system tuned... I0528 00:27:28.762013 10977 controller.go:1547] started events processors I0528 00:27:28.762063 10977 controller.go:1568] monitoring filesystem events on "/etc/tuned/bootcmdline" I0528 00:27:28.762070 10977 controller.go:1571] started controller I0528 00:27:28.762128 10977 controller.go:359] set log level 0 I0528 00:27:28.762197 10977 controller.go:614] providerExtract(): extracting cloud provider name to /var/lib/ocp-tuned/provider I0528 00:27:28.762359 10977 controller.go:692] tunedRecommendFileWrite(): written "/etc/tuned/recommend.d/50-openshift.conf" to set TuneD profile openshift-control-plane I0528 00:27:28.762368 10977 controller.go:417] profilesExtract(): extracting 1 TuneD profiles (recommended=openshift-control-plane) I0528 00:27:28.809735 10977 controller.go:462] profilesExtract(): recommended TuneD profile openshift-control-plane content changed [openshift] I0528 00:27:28.809870 10977 controller.go:478] profilesExtract(): fingerprint of extracted profiles: "3d0c4b179e02d27e7c0c64d8a4cfe6b3e85ae111f1d1202b6362f4bb92dbc627" I0528 00:27:28.809914 10977 controller.go:818] tunedReload() I0528 00:27:28.809939 10977 controller.go:745] starting tuned... I0528 00:27:28.809948 10977 run.go:121] running cmd... 2025-05-28 00:27:28,889 INFO tuned.daemon.application: TuneD: 2.25.1, kernel: 5.14.0-570.16.1.el9_6.x86_64 2025-05-28 00:27:28,889 INFO tuned.daemon.application: dynamic tuning is globally disabled 2025-05-28 00:27:28,891 INFO tuned.daemon.daemon: using sleep interval of 1 second(s) 2025-05-28 00:27:28,892 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2025-05-28 00:27:28,892 INFO tuned.daemon.daemon: Using 'openshift-control-plane' profile 2025-05-28 00:27:28,893 INFO tuned.profiles.loader: loading profile: openshift-control-plane 2025-05-28 00:27:28,948 INFO tuned.daemon.controller: starting controller 2025-05-28 00:27:28,948 INFO tuned.daemon.controller: waiting for udev to settle Traceback (most recent call last): File "/usr/sbin/tuned", line 98, in <module> app.run(args.daemon) File "/usr/lib/python3.9/site-packages/tuned/daemon/application.py", line 215, in run result = self._controller.run() File "/usr/lib/python3.9/site-packages/tuned/daemon/controller.py", line 68, in run p = monitor.poll(timeout = 1) File "/usr/lib/python3.9/site-packages/pyudev/monitor.py", line 354, in poll if eintr_retry_call(poll.Poll.for_events((self, 'r')).poll, timeout): File "/usr/lib/python3.9/site-packages/pyudev/_util.py", line 159, in eintr_retry_call return func(*args, **kwargs) File "/usr/lib/python3.9/site-packages/pyudev/_os/poll.py", line 94, in poll return list( File "/usr/lib/python3.9/site-packages/pyudev/_os/poll.py", line 110, in _parse_events raise IOError('Error while polling fd: {0!r}'.format(fd)) OSError: Error while polling fd: 4 E0528 00:27:29.390415 10977 controller.go:763] Error while running tuned error waiting for tuned: exit status 1 I0528 00:37:24.006916 10977 controller.go:359] set log level 0
the workaround that I figured is to delete the pod, then the pod can start without issue I'm also attaching [^tuned-jv89v-tuned.log]
which is from a good pod for comparison.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Node tuning
Additional info:
- relates to
-
OCPBUGS-56442 OCP 4.18+ | Node Tuning Operator is marked as degraded during IPI wait-for-install process
-
- Verified
-