Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.19.0
Component/s: Node Tuning Operator
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    when I reinstall 4.19.0-rc.3 on my 3 node bare metal host cluster, there is always one or two tuned pods fail to start as shown in below :
# oc get clusteroperator
NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.19.0-rc.3   True        False         False      12m     
baremetal                                  4.19.0-rc.3   True        False         False      31m     
cloud-controller-manager                   4.19.0-rc.3   True        False         False      33m     
cloud-credential                           4.19.0-rc.3   True        False         False      40m     
cluster-autoscaler                         4.19.0-rc.3   True        False         False      31m     
config-operator                            4.19.0-rc.3   True        False         False      32m     
console                                    4.19.0-rc.3   True        False         False      17m     
control-plane-machine-set                  4.19.0-rc.3   True        False         False      31m     
csi-snapshot-controller                    4.19.0-rc.3   True        False         False      31m     
dns                                        4.19.0-rc.3   True        False         False      31m     
etcd                                       4.19.0-rc.3   True        False         False      30m     
image-registry                             4.19.0-rc.3   True        False         False      18m     
ingress                                    4.19.0-rc.3   True        False         False      21m     
insights                                   4.19.0-rc.3   True        False         False      31m     
kube-apiserver                             4.19.0-rc.3   True        False         False      27m     
kube-controller-manager                    4.19.0-rc.3   True        False         False      27m     
kube-scheduler                             4.19.0-rc.3   True        False         False      29m     
kube-storage-version-migrator              4.19.0-rc.3   True        False         False      32m     
machine-api                                4.19.0-rc.3   True        False         False      28m     
machine-approver                           4.19.0-rc.3   True        False         False      32m     
machine-config                             4.19.0-rc.3   True        False         False      30m     
marketplace                                4.19.0-rc.3   True        False         False      31m     
monitoring                                 4.19.0-rc.3   True        False         False      13m     
network                                    4.19.0-rc.3   True        False         False      32m     
node-tuning                                4.19.0-rc.3   True        True          False      14m     Waiting for 1/3 Profiles to be applied
olm                                        4.19.0-rc.3   True        False         False      31m     
openshift-apiserver                        4.19.0-rc.3   True        False         False      22m     
openshift-controller-manager               4.19.0-rc.3   True        False         False      27m     
openshift-samples                          4.19.0-rc.3   True        False         False      21m     
operator-lifecycle-manager                 4.19.0-rc.3   True        False         False      31m     
operator-lifecycle-manager-catalog         4.19.0-rc.3   True        False         False      31m     
operator-lifecycle-manager-packageserver   4.19.0-rc.3   True        False         False      22m     
service-ca                                 4.19.0-rc.3   True        False         False      32m     
storage                                    4.19.0-rc.3   True        False         False      32m

here is the pod log from failed pod:
I0528 00:27:28.613490   10977 controller.go:1667] starting in-cluster ocp-tuned v4.19.0-202505140813.p0.g7172669.assembly.stream.el9-0-g80d720b-dirty
I0528 00:27:28.658393   10977 controller.go:671] writing /var/lib/ocp-tuned/image.env
E0528 00:27:28.661083   10977 controller.go:1712] error repacking the profile: open /etc/tuned/recommend.d/50-openshift.conf: no such file or directory
I0528 00:27:28.661099   10977 controller.go:1713] deferred updates likely broken
I0528 00:27:28.661106   10977 controller.go:1729] starting: profile unpacked is "" fingerprint "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
I0528 00:27:28.661117   10977 controller.go:1425] recover: no pending deferred change
I0528 00:27:28.661123   10977 controller.go:1735] starting: no pending deferred update
I0528 00:27:28.669889   10977 controller.go:382] disabling system tuned...
I0528 00:27:28.762013   10977 controller.go:1547] started events processors
I0528 00:27:28.762063   10977 controller.go:1568] monitoring filesystem events on "/etc/tuned/bootcmdline"
I0528 00:27:28.762070   10977 controller.go:1571] started controller
I0528 00:27:28.762128   10977 controller.go:359] set log level 0
I0528 00:27:28.762197   10977 controller.go:614] providerExtract(): extracting cloud provider name to /var/lib/ocp-tuned/provider
I0528 00:27:28.762359   10977 controller.go:692] tunedRecommendFileWrite(): written "/etc/tuned/recommend.d/50-openshift.conf" to set TuneD profile openshift-control-plane
I0528 00:27:28.762368   10977 controller.go:417] profilesExtract(): extracting 1 TuneD profiles (recommended=openshift-control-plane)
I0528 00:27:28.809735   10977 controller.go:462] profilesExtract(): recommended TuneD profile openshift-control-plane content changed [openshift]
I0528 00:27:28.809870   10977 controller.go:478] profilesExtract(): fingerprint of extracted profiles: "3d0c4b179e02d27e7c0c64d8a4cfe6b3e85ae111f1d1202b6362f4bb92dbc627"
I0528 00:27:28.809914   10977 controller.go:818] tunedReload()
I0528 00:27:28.809939   10977 controller.go:745] starting tuned...
I0528 00:27:28.809948   10977 run.go:121] running cmd...
2025-05-28 00:27:28,889 INFO     tuned.daemon.application: TuneD: 2.25.1, kernel: 5.14.0-570.16.1.el9_6.x86_64
2025-05-28 00:27:28,889 INFO     tuned.daemon.application: dynamic tuning is globally disabled
2025-05-28 00:27:28,891 INFO     tuned.daemon.daemon: using sleep interval of 1 second(s)
2025-05-28 00:27:28,892 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2025-05-28 00:27:28,892 INFO     tuned.daemon.daemon: Using 'openshift-control-plane' profile
2025-05-28 00:27:28,893 INFO     tuned.profiles.loader: loading profile: openshift-control-plane
2025-05-28 00:27:28,948 INFO     tuned.daemon.controller: starting controller
2025-05-28 00:27:28,948 INFO     tuned.daemon.controller: waiting for udev to settle
Traceback (most recent call last):
  File "/usr/sbin/tuned", line 98, in <module>
    app.run(args.daemon)
  File "/usr/lib/python3.9/site-packages/tuned/daemon/application.py", line 215, in run
    result = self._controller.run()
  File "/usr/lib/python3.9/site-packages/tuned/daemon/controller.py", line 68, in run
    p = monitor.poll(timeout = 1)
  File "/usr/lib/python3.9/site-packages/pyudev/monitor.py", line 354, in poll
    if eintr_retry_call(poll.Poll.for_events((self, 'r')).poll, timeout):
  File "/usr/lib/python3.9/site-packages/pyudev/_util.py", line 159, in eintr_retry_call
    return func(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/pyudev/_os/poll.py", line 94, in poll
    return list(
  File "/usr/lib/python3.9/site-packages/pyudev/_os/poll.py", line 110, in _parse_events
    raise IOError('Error while polling fd: {0!r}'.format(fd))
OSError: Error while polling fd: 4
E0528 00:27:29.390415   10977 controller.go:763] Error while running tuned error waiting for tuned: exit status 1
I0528 00:37:24.006916   10977 controller.go:359] set log level 0

the workaround that I figured is to delete the pod, then the pod can start without issue I'm also attaching [^tuned-jv89v-tuned.log]

which is from a good pod for comparison.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

    Node tuning

Additional info:

relates to

OCPBUGS-56442 OCP 4.18+ | Node Tuning Operator is marked as degraded during IPI wait-for-install process

Closed

Assignee:: Team NTO

Reporter:: Ting Xue

Need Info From:: None

Contributors:: None

QA Contact:: Liquan Cui

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/05/28 1:20 AM

Updated:: 2025/07/12 1:27 PM

Resolved:: 2025/05/28 5:27 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates