Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-24704

Applying performanceprofile or kubeletconfig with systemReserved changes causes ALL cluster nodes to reboot

    XMLWordPrintable

Details

    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      On a fresh cluster deployment, applying performanceprofile or kubeletconfig for the first time (no reserved memory-related changes were done previously) with systemReserved changes causes ALL cluster nodes to reboot, event nodes this change is unrelated to.
      
      Important: 
       - nodes for which the change is not relevant for, were just rebooted, and the change was not applied for them
      
      - such all nodes reboot only once on the clean deployment; on the following changes for the system, reserved memory/CPU by performanceprofile or kubeleteconfig were rebooted relevant nodes only

      Version-Release number of selected component (if applicable):

      Client Version: 4.15.0-ec.3
      Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
      Server Version: 4.15.0-ec.3
      Kubernetes Version: v1.28.3+20a5764
      
      the same issue observed on 4.14 as well:
      Client Version: 4.14.6
      Kustomize Version: v5.0.1
      Server Version: 4.14.6
      Kubernetes Version: v1.27.8+4fab27b 

      How reproducible:

          always

      Steps to Reproduce:

          1. deploy cluster (even the minimal cluster will be suitable to reproduce this issue: 3 masters + 2 workers
          2. apply kubeletconfig related to the reserved memory change. For example:
      
      apiVersion: machineconfiguration.openshift.io/v1
      kind: KubeletConfig
      metadata:
        name: set-sysreserved-master
        resourceVersion: "774812"
        uid: 3a575d56-d5ac-4e12-bbdf-5f7c328ed705
      spec:
        kubeletConfig:
          systemReserved:
            cpu: 500m
            memory: 27Gi
        machineConfigPoolSelector:
          matchLabels:
            pools.operator.machineconfiguration.openshift.io/master: ""
      
      
          

      Actual results:

      kubeletconfig was created, change was applied to all master nodes, and master nodes were rebooted. But simultaneously, a rolling reboot of the worker nodes was started.

      Expected results:

          kubeletconfig was created, change was applied to all master nodes, and master nodes were rebooted. And no workers reboot should be observed

      Additional info:

          performanceprofile example:
      
      apiVersion: performance.openshift.io/v2
      kind: PerformanceProfile
      metadata:
        name: performance-samsung-cnf
      spec:
        additionalKernelArgs:
        - mmio_stale_data=off
        - mds=off
        - tsx_async_abort=off
        - retbleed=off
        cpu:
          isolated: 2-19,22-39,42-59,62-79
          reserved: 0-1,20-21,40-41,60-61
        globallyDisableIrqLoadBalancing: false
        hugepages:
          defaultHugepagesSize: 2M
          pages:
          - count: 32768
            node: 0
            size: 2M
          - count: 32768
            node: 1
            size: 2M
        machineConfigPoolSelector:
          machineconfiguration.openshift.io/role: samsung-cnf
        nodeSelector:
          node-role.kubernetes.io/samsung-cnf: ""
        numa:
          topologyPolicy: single-numa-node
        realTimeKernel:
          enabled: false
        workloadHints:
          highPowerConsumption: false
          realTime: false
      

      Attachments

        Activity

          People

            yquinn@redhat.com Yanir Quinn
            elgerman Elena German
            Sunil Choudhary Sunil Choudhary
            Elena German
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: