-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
4.19
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The Kube Descheduler Operator has a new `DevKubeVirtRelieveAndMigrate` profile. To leverage this profile the documents include a MachineConfig to set `psi=1` as a kernel argument. As a resolution of OCPBUGS-37271 we set as a platform default `psi=0`. These are present in the generated MachineConfigs as a default: 97-master-generated-kubelet 97-worker-generated-kubelet This causes a conflict where both arguments are present when querying `/proc/cmdline` - See attached cmdline-output.png When the Descheduler is triggered, it results in an error alert "DeschedulerPSIDisabled" - see attached DeschedulerPSIDisabled.jpg
Version-Release number of selected component (if applicable):
As of 4.17
How reproducible:
Consistently
Steps to Reproduce:
1. Create a MachineConfig for enabling PSI as per the documentation
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: 99-master-kargs-psi
spec:
kernelArguments:
- psi=1
2. Create the example KubeDescheduler CR
---
apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
metadata:
name: cluster
namespace: openshift-kube-descheduler-operator
spec:
managementState: Managed
deschedulingIntervalSeconds: 30
mode: "Automatic"
profiles:
- DevKubeVirtRelieveAndMigrate
profileCustomizations:
devEnableSoftTainter: true
devDeviationThresholds: AsymmetricLow
devActualUtilizationProfile: PrometheusCPUCombined
3. Create a VM or few.
4. Trigger an action on the descheduler. This can be done with a CPU Load pod:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cpuload
spec:
selector:
matchLabels:
app: cpuload
replicas: 1
template:
metadata:
labels:
app: cpuload
spec:
nodeSelector:
# change to the hostname of the node running the VM
kubernetes.io/hostname: worker-1
containers:
- name: container
image: 'quay.io/simonkrenger/cpuload:latest'
resources: # may need to change limits or replicas to trigger load shedding descheduler action
limits:
cpu: '8'
memory: 1Gi
requests:
cpu: '8'
memory: 1Gi
strategy:
type: Recreate
Actual results:
Descheduler fails, produces error and alert
Expected results:
VM to move to a more suitable host and rebalancing load
Additional info:
I wonder if this `psi=0` is a too-wide-spread platform default, and the original bug was due to the use of realtime kernel in Telco? If not, then our documentation for `DevKubeVirtRelieveAndMigrate` profile in Kube Descheduler should include a note about the default? Thoughts?
![]()
![]()
- depends on
-
OCPNODE-3806 Node Team to reconsider enabling PSI Metrics to help CNV Descheduler
-
- Closed
-