-
Bug
-
Resolution: Done
-
Normal
-
4.14
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
None
-
OCPNODE Sprint 240 (Blue)
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
I created a cluster with _workerLatencyProfile: LowUpdateSlowReaction_, then I edited the latencyProfile to MediumUpdateAverageReaction using documentation linked and this test case document below. Once I switched I waited for KubeControllerManager and KubeAPIServer to stop progressing/complete and noticed the nodeStatusUpdateFrequency under /etc/kubernetes/kubelet.conf does not change as expected
https://docs.google.com/document/d/19dPIE4WFxVc3ldu-hNoXiOkjBCQrHC6I7wfyaUyTDqw/edit#heading=h.kf4qxogy9r6
Version-Release number of selected component (if applicable):
4.14.0-0.nightly-2023-07-31-181848
How reproducible:
100%
Steps to Reproduce:
1. Create cluster with LowUpdateSlowReaction manifest: Example: https://docs.google.com/document/d/19dPIE4WFxVc3ldu-hNoXiOkjBCQrHC6I7wfyaUyTDqw/edit#heading=h.22najgyaj9lh
2. Validate values of low update profile components
$ oc debug node/<worker-node-name>
$ chroot /host
$ sh-4.4# cat /etc/kubernetes/kubelet.conf | grep nodeStatusUpdateFrequency
"nodeStatusUpdateFrequency": "1m0s",
$ oc get KubeControllerManager -o yaml | grep -A 1 node-monitor
node-monitor-grace-period:
- 5m0s
$ oc get KubeAPIServer -o yaml | grep -A 1 default-
default-not-ready-toleration-seconds:
- "60"
Default-unreachable-toleration-seconds:
- "60"
3. *oc edit nodes.config/cluster*
spec:
workerLatencyProfile: MediumUpdateAverageReaction
4. Wait for components to complete using
oc get KubeControllerManager -o yaml | grep -i workerlatency -A 5 -B 5
and
oc get KubeAPIServer -o yaml | grep -i workerlatency -A 5 -B 5
5. Validate medium component values, hitting error here
Actual results:
% oc get KubeControllerManager -o yaml | grep -A 1 node-monitor
node-monitor-grace-period:
- 2m0s
prubenda@prubenda1-mac lrc % oc get KubeAPIServer -o yaml | grep -A 1 default-
default-not-ready-toleration-seconds:
- "60"
default-unreachable-toleration-seconds:
- "60"
sh-5.1# cat /etc/kubernetes/kubelet.conf | grep nodeStatusUpdateFrequency
"nodeStatusUpdateFrequency": "1m0s",
Expected results:
$ oc debug node/<worker-node-name>
$ chroot /host
$ sh-4.4# cat /etc/kubernetes/kubelet.conf | grep nodeStatusUpdateFrequency
"nodeStatusUpdateFrequency": "20s",
$ oc get KubeControllerManager -o yaml | grep -A 1 node-monitor
node-monitor-grace-period:
- 2m0s
$ oc get KubeAPIServer -o yaml | grep -A 1 default-
default-not-ready-toleration-seconds:
- "60"
default-unreachable-toleration-seconds:
- "60"
Additional info:
In the documentation it states that workers will go disabled while the change is being applied and I never saw that occur