Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: 4.14.z, 4.15
Component/s: Performance Addon Operator
Labels:
- Samsung
- blue
- mco-triaged
- system-test
- telco-5g-core
- triaged

Regression:
No
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

On a fresh cluster deployment, applying performanceprofile or kubeletconfig for the first time (no reserved memory-related changes were done previously) with systemReserved changes causes ALL cluster nodes to reboot, event nodes this change is unrelated to.

Important: 
 - nodes for which the change is not relevant for, were just rebooted, and the change was not applied for them

- such all nodes reboot only once on the clean deployment; on the following changes for the system, reserved memory/CPU by performanceprofile or kubeleteconfig were rebooted relevant nodes only

Version-Release number of selected component (if applicable):

Client Version: 4.15.0-ec.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.15.0-ec.3
Kubernetes Version: v1.28.3+20a5764

the same issue observed on 4.14 as well:
Client Version: 4.14.6
Kustomize Version: v5.0.1
Server Version: 4.14.6
Kubernetes Version: v1.27.8+4fab27b

How reproducible:

    always

Steps to Reproduce:

    1. deploy cluster (even the minimal cluster will be suitable to reproduce this issue: 3 masters + 2 workers
    2. apply kubeletconfig related to the reserved memory change. For example:

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: set-sysreserved-master
  resourceVersion: "774812"
  uid: 3a575d56-d5ac-4e12-bbdf-5f7c328ed705
spec:
  kubeletConfig:
    systemReserved:
      cpu: 500m
      memory: 27Gi
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/master: ""

Actual results:

kubeletconfig was created, change was applied to all master nodes, and master nodes were rebooted. But simultaneously, a rolling reboot of the worker nodes was started.

Expected results:

    kubeletconfig was created, change was applied to all master nodes, and master nodes were rebooted. And no workers reboot should be observed

Additional info:

    performanceprofile example:

apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: performance-samsung-cnf
spec:
  additionalKernelArgs:
  - mmio_stale_data=off
  - mds=off
  - tsx_async_abort=off
  - retbleed=off
  cpu:
    isolated: 2-19,22-39,42-59,62-79
    reserved: 0-1,20-21,40-41,60-61
  globallyDisableIrqLoadBalancing: false
  hugepages:
    defaultHugepagesSize: 2M
    pages:
    - count: 32768
      node: 0
      size: 2M
    - count: 32768
      node: 1
      size: 2M
  machineConfigPoolSelector:
    machineconfiguration.openshift.io/role: samsung-cnf
  nodeSelector:
    node-role.kubernetes.io/samsung-cnf: ""
  numa:
    topologyPolicy: single-numa-node
  realTimeKernel:
    enabled: false
  workloadHints:
    highPowerConsumption: false
    realTime: false

Assignee:: Yanir Quinn

Reporter:: Elena German

QA Contact:: Sunil Choudhary

Need Info From:: Elena German

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2023/12/08 6:51 PM

Updated:: 2024/02/26 12:50 PM

Resolved:: 2024/02/26 12:50 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates