Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Minor
Fix Version/s: None
Affects Version/s: 4.19
Component/s: descheduler
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

The Kube Descheduler Operator has a new `DevKubeVirtRelieveAndMigrate` profile.  To leverage this profile the documents include a MachineConfig to set `psi=1` as a kernel argument.

As a resolution of OCPBUGS-37271 we set as a platform default `psi=0`.  These are present in the generated MachineConfigs as a default:
97-master-generated-kubelet
97-worker-generated-kubelet

This causes a conflict where both arguments are present when querying `/proc/cmdline` - See attached cmdline-output.png

When the Descheduler is triggered, it results in an error alert "DeschedulerPSIDisabled" - see attached DeschedulerPSIDisabled.jpg

Version-Release number of selected component (if applicable):

    As of 4.17

How reproducible:

    Consistently

Steps to Reproduce:

    1. Create a MachineConfig for enabling PSI as per the documentation
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-master-kargs-psi
spec:
  kernelArguments:
  - psi=1

    2. Create the example KubeDescheduler CR
---
apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
metadata:
  name: cluster
  namespace: openshift-kube-descheduler-operator
spec:
  managementState: Managed
  deschedulingIntervalSeconds: 30
  mode: "Automatic"
  profiles:
    - DevKubeVirtRelieveAndMigrate
  profileCustomizations:
    devEnableSoftTainter: true
    devDeviationThresholds: AsymmetricLow
    devActualUtilizationProfile: PrometheusCPUCombined

    3. Create a VM or few.

    4. Trigger an action on the descheduler.  This can be done with a CPU Load pod:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cpuload
spec:
  selector:
    matchLabels:
      app: cpuload
  replicas: 1
  template:
    metadata:
      labels:
        app: cpuload
    spec:
      nodeSelector:
        # change to the hostname of the node running the VM
        kubernetes.io/hostname: worker-1
      containers:
        - name: container
          image: 'quay.io/simonkrenger/cpuload:latest'
          resources: # may need to change limits or replicas to trigger load shedding descheduler action
            limits:
              cpu: '8'
              memory: 1Gi
            requests:
              cpu: '8'
              memory: 1Gi
  strategy:
    type: Recreate

Actual results:

    Descheduler fails, produces error and alert

Expected results:

    VM to move to a more suitable host and rebalancing load

Additional info:

I wonder if this `psi=0` is a too-wide-spread platform default, and the original bug was due to the use of realtime kernel in Telco?

If not, then our documentation for `DevKubeVirtRelieveAndMigrate` profile in Kube Descheduler should include a note about the default?

Thoughts?

depends on

OCPNODE-3806 Node Team to reconsider enabling PSI Metrics to help CNV Descheduler

Closed

is related to

CNV-58437 [GA] CPU Load Aware rebalancing with Descheduler

Closed

CNV-44186 Enable schedstats for vCPU wait metrics by default

Assignee:: Neeraj Krishna Gopalakrishna

Reporter:: Ken Moini

Need Info From:: None

Contributors:: None

QA Contact:: Rama Kasturi Narra

Doc Contact:: None

Votes:: 1 Vote for this issue

Watchers:: 12 Start watching this issue

Created:: 2025/09/27 4:18 AM

Updated:: 2025/11/27 12:43 PM

Resolved:: 2025/11/27 12:43 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates