Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17433

Worker Latency Profile not changing kubelet nodeStatusUpdateFrequency

XMLWordPrintable

    • No
    • OCPNODE Sprint 240 (Blue)
    • 1
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      I created a cluster with _workerLatencyProfile: LowUpdateSlowReaction_, then I edited the latencyProfile to MediumUpdateAverageReaction using documentation linked and this test case document below. Once I switched I waited for KubeControllerManager and KubeAPIServer to stop progressing/complete and noticed the nodeStatusUpdateFrequency under /etc/kubernetes/kubelet.conf does not change as expected
      
      

      https://docs.google.com/document/d/19dPIE4WFxVc3ldu-hNoXiOkjBCQrHC6I7wfyaUyTDqw/edit#heading=h.kf4qxogy9r6
      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-07-31-181848
      

      How reproducible:

      100% 
      

      Steps to Reproduce:

      1. Create cluster with LowUpdateSlowReaction manifest: Example: https://docs.google.com/document/d/19dPIE4WFxVc3ldu-hNoXiOkjBCQrHC6I7wfyaUyTDqw/edit#heading=h.22najgyaj9lh
      2. Validate values of low update profile components 
      
      $ oc debug node/<worker-node-name>
      $ chroot /host 
      $ sh-4.4# cat /etc/kubernetes/kubelet.conf | grep nodeStatusUpdateFrequency 
        "nodeStatusUpdateFrequency": "1m0s",
      $ oc get KubeControllerManager -o yaml | grep -A 1 node-monitor
              node-monitor-grace-period:
              - 5m0s
      $ oc get KubeAPIServer -o yaml | grep -A 1 default-
              default-not-ready-toleration-seconds:
              - "60"
              Default-unreachable-toleration-seconds:
              - "60"
      3. *oc edit nodes.config/cluster*
      spec: 
        workerLatencyProfile: MediumUpdateAverageReaction
      4. Wait for components to complete using 
      
      oc get KubeControllerManager -o yaml | grep -i workerlatency -A 5 -B 5
      and 
      oc get KubeAPIServer -o yaml | grep -i workerlatency -A 5 -B 5
      
      5. Validate medium component values, hitting error here
      
      
      

      Actual results:

      % oc get KubeControllerManager -o yaml | grep -A 1 node-monitor
              node-monitor-grace-period:
              - 2m0s
      prubenda@prubenda1-mac lrc % oc get KubeAPIServer -o yaml | grep -A 1 default-
              default-not-ready-toleration-seconds:
              - "60"
              default-unreachable-toleration-seconds:
              - "60"
      sh-5.1# cat /etc/kubernetes/kubelet.conf | grep nodeStatusUpdateFrequency 
        "nodeStatusUpdateFrequency": "1m0s",
      

      Expected results:

      $ oc debug node/<worker-node-name>
      $ chroot /host 
      $ sh-4.4# cat /etc/kubernetes/kubelet.conf | grep nodeStatusUpdateFrequency 
        "nodeStatusUpdateFrequency": "20s",
      $ oc get KubeControllerManager -o yaml | grep -A 1 node-monitor
              node-monitor-grace-period:
              - 2m0s
      $ oc get KubeAPIServer -o yaml | grep -A 1 default-
              default-not-ready-toleration-seconds:
              - "60"
              default-unreachable-toleration-seconds:
              - "60"
      

      Additional info:

      In the documentation it states that workers will go disabled while the change is being applied and I never saw that occur
      

            svanka@redhat.com Sai Ramesh Vanka
            prubenda Paige Rubendall
            Sunil Choudhary Sunil Choudhary
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: