-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhel-8.6.0.z
-
None
-
No
-
Moderate
-
rhel-sst-cs-net-perf-services
-
ssg_core_services
-
3
-
False
-
-
None
-
None
-
None
-
None
-
None
What were you trying to do that didn't work?
Running TuneD tuned-2.20.0-1.3.20230614git850368d2.el8fdp.noarch from OCP 4.13.13. RHCOS 9.2 host, RHCOS 8.6 container with TuneD. During SIGHUP to TuneD to reload its profile, we saw a Python trace.
What is the impact of this issue to you?
Tuning likely not applied correctly.
Please provide the package NVR for which the bug is seen:
tuned-2.20.0-1.3.20230614git850368d2.el8fdp.noarch
How reproducible is this bug?
Rare. I failed to reproduce it myself and so did the support engineer.
Steps to reproduce
None at the moment, I'll try to reproduce this in the future. I believe this shouldn't normally happen, only when changing a profile on disk while sending `TuneD` SIGHUP while doing this. This is what the actual profile on the node had:
[main]
summary=setting the following parameter for baremetal worker nodes
include=openshift-node
[sysctl]
kernel.printk = 4 4 1 7
net.core.rmem_max = 16777216
net.core.wmem_max = 536870912
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 268435456
net.ipv4.tcp_congestion_control = cubic
net.core.default_qdisc = fq
vm.min_free_kbytes = 1081344
Expected results
No traceback from TuneD.
Actual results
2024-09-24T07:05:14.502158146Z Exception in thread Thread-30: 2024-09-24T07:05:14.502158146Z Traceback (most recent call last): 2024-09-24T07:05:14.502176470Z File "/usr/lib64/python3.6/threading.py", line 937, in _bootstrap_inner 2024-09-24T07:05:14.502176470Z self.run() 2024-09-24T07:05:14.502176470Z File "/usr/lib64/python3.6/threading.py", line 885, in run 2024-09-24T07:05:14.502176470Z self._target(*self._args, **self._kwargs) 2024-09-24T07:05:14.502176470Z File "/usr/lib/python3.6/site-packages/tuned/plugins/plugin_scheduler.py", line 1099, in _thread_code 2024-09-24T07:05:14.502176470Z if event.type == perf.RECORD_COMM or \ 2024-09-24T07:05:14.502176470Z AttributeError: 'perf.lost_event' object has no attribute 'type'
Additional information
I suspect the customers will not see this on OCP 4.14+ where we fixed a possible race not to extract TuneD profiles during TuneD reloads.
https://redhat-internal.slack.com/archives/C04KZV0RD6Z/p1727686493000219
https://redhat-internal.slack.com/archives/CQNBUEVM2/p1727431990986979