-
Bug
-
Resolution: Duplicate
-
Normal
-
None
-
4.19.0
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
when I reinstall 4.19.0-rc.3 on my 3 node bare metal host cluster, there is always one or two tuned pods fail to start as shown in below :
# oc get clusteroperator
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.19.0-rc.3 True False False 12m
baremetal 4.19.0-rc.3 True False False 31m
cloud-controller-manager 4.19.0-rc.3 True False False 33m
cloud-credential 4.19.0-rc.3 True False False 40m
cluster-autoscaler 4.19.0-rc.3 True False False 31m
config-operator 4.19.0-rc.3 True False False 32m
console 4.19.0-rc.3 True False False 17m
control-plane-machine-set 4.19.0-rc.3 True False False 31m
csi-snapshot-controller 4.19.0-rc.3 True False False 31m
dns 4.19.0-rc.3 True False False 31m
etcd 4.19.0-rc.3 True False False 30m
image-registry 4.19.0-rc.3 True False False 18m
ingress 4.19.0-rc.3 True False False 21m
insights 4.19.0-rc.3 True False False 31m
kube-apiserver 4.19.0-rc.3 True False False 27m
kube-controller-manager 4.19.0-rc.3 True False False 27m
kube-scheduler 4.19.0-rc.3 True False False 29m
kube-storage-version-migrator 4.19.0-rc.3 True False False 32m
machine-api 4.19.0-rc.3 True False False 28m
machine-approver 4.19.0-rc.3 True False False 32m
machine-config 4.19.0-rc.3 True False False 30m
marketplace 4.19.0-rc.3 True False False 31m
monitoring 4.19.0-rc.3 True False False 13m
network 4.19.0-rc.3 True False False 32m
node-tuning 4.19.0-rc.3 True True False 14m Waiting for 1/3 Profiles to be applied
olm 4.19.0-rc.3 True False False 31m
openshift-apiserver 4.19.0-rc.3 True False False 22m
openshift-controller-manager 4.19.0-rc.3 True False False 27m
openshift-samples 4.19.0-rc.3 True False False 21m
operator-lifecycle-manager 4.19.0-rc.3 True False False 31m
operator-lifecycle-manager-catalog 4.19.0-rc.3 True False False 31m
operator-lifecycle-manager-packageserver 4.19.0-rc.3 True False False 22m
service-ca 4.19.0-rc.3 True False False 32m
storage 4.19.0-rc.3 True False False 32m
here is the pod log from failed pod:
I0528 00:27:28.613490 10977 controller.go:1667] starting in-cluster ocp-tuned v4.19.0-202505140813.p0.g7172669.assembly.stream.el9-0-g80d720b-dirty
I0528 00:27:28.658393 10977 controller.go:671] writing /var/lib/ocp-tuned/image.env
E0528 00:27:28.661083 10977 controller.go:1712] error repacking the profile: open /etc/tuned/recommend.d/50-openshift.conf: no such file or directory
I0528 00:27:28.661099 10977 controller.go:1713] deferred updates likely broken
I0528 00:27:28.661106 10977 controller.go:1729] starting: profile unpacked is "" fingerprint "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
I0528 00:27:28.661117 10977 controller.go:1425] recover: no pending deferred change
I0528 00:27:28.661123 10977 controller.go:1735] starting: no pending deferred update
I0528 00:27:28.669889 10977 controller.go:382] disabling system tuned...
I0528 00:27:28.762013 10977 controller.go:1547] started events processors
I0528 00:27:28.762063 10977 controller.go:1568] monitoring filesystem events on "/etc/tuned/bootcmdline"
I0528 00:27:28.762070 10977 controller.go:1571] started controller
I0528 00:27:28.762128 10977 controller.go:359] set log level 0
I0528 00:27:28.762197 10977 controller.go:614] providerExtract(): extracting cloud provider name to /var/lib/ocp-tuned/provider
I0528 00:27:28.762359 10977 controller.go:692] tunedRecommendFileWrite(): written "/etc/tuned/recommend.d/50-openshift.conf" to set TuneD profile openshift-control-plane
I0528 00:27:28.762368 10977 controller.go:417] profilesExtract(): extracting 1 TuneD profiles (recommended=openshift-control-plane)
I0528 00:27:28.809735 10977 controller.go:462] profilesExtract(): recommended TuneD profile openshift-control-plane content changed [openshift]
I0528 00:27:28.809870 10977 controller.go:478] profilesExtract(): fingerprint of extracted profiles: "3d0c4b179e02d27e7c0c64d8a4cfe6b3e85ae111f1d1202b6362f4bb92dbc627"
I0528 00:27:28.809914 10977 controller.go:818] tunedReload()
I0528 00:27:28.809939 10977 controller.go:745] starting tuned...
I0528 00:27:28.809948 10977 run.go:121] running cmd...
2025-05-28 00:27:28,889 INFO tuned.daemon.application: TuneD: 2.25.1, kernel: 5.14.0-570.16.1.el9_6.x86_64
2025-05-28 00:27:28,889 INFO tuned.daemon.application: dynamic tuning is globally disabled
2025-05-28 00:27:28,891 INFO tuned.daemon.daemon: using sleep interval of 1 second(s)
2025-05-28 00:27:28,892 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2025-05-28 00:27:28,892 INFO tuned.daemon.daemon: Using 'openshift-control-plane' profile
2025-05-28 00:27:28,893 INFO tuned.profiles.loader: loading profile: openshift-control-plane
2025-05-28 00:27:28,948 INFO tuned.daemon.controller: starting controller
2025-05-28 00:27:28,948 INFO tuned.daemon.controller: waiting for udev to settle
Traceback (most recent call last):
File "/usr/sbin/tuned", line 98, in <module>
app.run(args.daemon)
File "/usr/lib/python3.9/site-packages/tuned/daemon/application.py", line 215, in run
result = self._controller.run()
File "/usr/lib/python3.9/site-packages/tuned/daemon/controller.py", line 68, in run
p = monitor.poll(timeout = 1)
File "/usr/lib/python3.9/site-packages/pyudev/monitor.py", line 354, in poll
if eintr_retry_call(poll.Poll.for_events((self, 'r')).poll, timeout):
File "/usr/lib/python3.9/site-packages/pyudev/_util.py", line 159, in eintr_retry_call
return func(*args, **kwargs)
File "/usr/lib/python3.9/site-packages/pyudev/_os/poll.py", line 94, in poll
return list(
File "/usr/lib/python3.9/site-packages/pyudev/_os/poll.py", line 110, in _parse_events
raise IOError('Error while polling fd: {0!r}'.format(fd))
OSError: Error while polling fd: 4
E0528 00:27:29.390415 10977 controller.go:763] Error while running tuned error waiting for tuned: exit status 1
I0528 00:37:24.006916 10977 controller.go:359] set log level 0
the workaround that I figured is to delete the pod, then the pod can start without issue I'm also attaching [^tuned-jv89v-tuned.log]
which is from a good pod for comparison.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Node tuning
Additional info:
- relates to
-
OCPBUGS-56442 OCP 4.18+ | Node Tuning Operator is marked as degraded during IPI wait-for-install process
-
- Closed
-