-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.15.0
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
No
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Since https://github.com/openshift/cluster-node-tuning-operator/pull/854, the preferred way to create a PerformanceProfile is to do it at Day 0. However it seems not working for SNO and compact clusters when the PerformanceProfile is referencing the master MCP.
Version-Release number of selected component (if applicable):
OpenShift v4.15.0-rc.0
How reproducible:
Tested on BM IPI and SNO BM deployments.
Steps to Reproduce:
1. * Create an install-config.yaml file to deploy a BareMetal IPI OpenShift 4.15.0-rc.0 cluster with compute.workers.replicas set to 0.
* or create an install-config.yaml file to deploy a BareMetal SNO cluster using the the manual method described in OpenShift documentation (https://docs.openshift.com/container-platform/latest/installing/installing_sno/install-sno-installing-sno.html#install-sno-installing-sno-manually).
2. After running the command {{openshift-install create manifests}}, create the following manifests at Day 0 (they are similar to the ones referrenced in https://issues.redhat.com/browse/OCPBUGS-18640):
---
kind: MachineConfigPool
apiVersion: machineconfiguration.openshift.io/v1
metadata:
name: master
spec:
machineConfigSelector:
matchLabels:
machineconfiguration.openshift.io/role: master
nodeSelector:
matchLabels:
node-role.kubernetes.io/master: ""
---
kind: PerformanceProfile
apiVersion: performance.openshift.io/v2
metadata:
name: dpdk
spec:
cpu:
isolated: "1-3"
reserved: "0"
hugepages:
defaultHugepagesSize: 2M
pages:
- size: 2M
count: 32
net:
userLevelNetworking: true
numa:
topologyPolicy: single-numa-node
realTimeKernel:
enabled: false
machineConfigPoolSelector:
pools.operator.machineconfiguration.openshift.io/master: ""
nodeSelector:
node-role.kubernetes.io/master: ""
3. Deploy the cluster
Actual results:
Cluster deployment fails at bootstrapping stage For SNO clusters, most of the time logs are spamming the following error > journalctl -b -u bootkube.service bootkube.sh[7451]: [#4] failed to create some manifests: bootkube.sh[7451]: "performance_profile_dpdk.yaml": failed to create performanceprofiles.v2.performance.openshift.io/dpdk -n : Internal error occurred: failed calling webhook "vwb.performance.openshift.io": failed to call webhook: Post "https://performance-addon-operator-service.openshift-cluster-node-tuning-operator.svc:443/validate-performance-openshift-io-v2-performanceprofile?timeout=10s": no endpoints available for service "performance-addon-operator-service" For compact clusters (and SNO when it doesn't fail previously) logs are spamming the following error > oc -n openshift-machine-config-operator logs deployment/machine-config-controller -c machine-config-controller I1223 14:10:24.299182 1 kubelet_config_controller.go:491] KubeletConfig performance-dpdk has been deleted W1223 14:10:25.095025 1 kubelet_config_controller.go:462] error updating the kubelet config with annotation key "machineconfiguration.openshift.io/mc-name-suffix" and value "": kubeletconfig.machineconfiguration.openshift.io "performance-dpdk" not found W1223 14:10:25.095050 1 kubelet_config_controller.go:429] error updating kubeletconfig status: kubeletconfig.machineconfiguration.openshift.io "performance-dpdk" not found I1223 14:10:25.095060 1 kubelet_config_controller.go:332] Error syncing kubeletconfig performance-dpdk: kubeletconfig.machineconfiguration.openshift.io "performance-dpdk" not found I1223 14:10:25.133332 1 node_controller.go:1035] No nodes available for updates I1223 14:10:25.133603 1 status.go:224] Degraded Machine: cnvqe-08.lab.eng.tlv2.redhat.com and Degraded Reason: machineconfig.machineconfiguration.openshift.io "rendered-master-82d8570749169c031983cc3e9151d030" not found
Additional info:
It seems simply creating a Tuned resource at Day 0 is also failing for SNO and compact clusters
---
kind: Tuned
apiVersion: tuned.openshift.io/v1
metadata:
name: hugepages
namespace: openshift-cluster-node-tuning-operator
spec:
profile:
- name: openshift-node-hugepages
data: |
[main]
summary=Boot time configuration for hugepages
include=openshift-node
[bootloader]
cmdline_openshift_node_hugepages=default_hugepagesz=2M hugepages=32
recommend:
- machineConfigLabels:
machineconfiguration.openshift.io/role: "master"
priority: 25
profile: openshift-node-hugepages
> oc -n openshift-machine-config-operator logs deployment/machine-config-controller -c machine-config-controller
I1222 21:35:08.908410 1 status.go:224] Degraded Machine: cnvqe-03.lab.eng.tlv2.redhat.com and Degraded Reason: machineconfig.machineconfiguration.openshift.io "rendered-master-f3b3143b5d67b2efcb405cb1051662a4" not found
> oc -n openshift-machine-config-operator logs daemonset/machine-config-daemon -c machine-config-daemon
I1222 21:26:28.144081 15114 node.go:52] Setting initial node config: rendered-master-f3b3143b5d67b2efcb405cb1051662a4
I1222 21:26:28.152814 15114 daemon.go:1495] In bootstrap mode
E1222 21:26:28.152954 15114 writer.go:226] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-f3b3143b5d67b2efcb405cb1051662a4" not found
- duplicates
-
OCPBUGS-22095 PerformanceProfile render fails at Day-0 because the master/worker pools are not yet present
-
- Closed
-
- is related to
-
OCPBUGS-25300 OCP SNO RAN DU deployment has additional reboot
-
- Closed
-