-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.15.0
-
None
-
Important
-
No
-
Rejected
-
False
-
Description of problem:
Since https://github.com/openshift/cluster-node-tuning-operator/pull/854, the preferred way to create a PerformanceProfile is to do it at Day 0. However it seems not working for SNO and compact clusters when the PerformanceProfile is referencing the master MCP.
Version-Release number of selected component (if applicable):
OpenShift v4.15.0-rc.0
How reproducible:
Tested on BM IPI and SNO BM deployments.
Steps to Reproduce:
1. * Create an install-config.yaml file to deploy a BareMetal IPI OpenShift 4.15.0-rc.0 cluster with compute.workers.replicas set to 0. * or create an install-config.yaml file to deploy a BareMetal SNO cluster using the the manual method described in OpenShift documentation (https://docs.openshift.com/container-platform/latest/installing/installing_sno/install-sno-installing-sno.html#install-sno-installing-sno-manually). 2. After running the command {{openshift-install create manifests}}, create the following manifests at Day 0 (they are similar to the ones referrenced in https://issues.redhat.com/browse/OCPBUGS-18640): --- kind: MachineConfigPool apiVersion: machineconfiguration.openshift.io/v1 metadata: name: master spec: machineConfigSelector: matchLabels: machineconfiguration.openshift.io/role: master nodeSelector: matchLabels: node-role.kubernetes.io/master: "" --- kind: PerformanceProfile apiVersion: performance.openshift.io/v2 metadata: name: dpdk spec: cpu: isolated: "1-3" reserved: "0" hugepages: defaultHugepagesSize: 2M pages: - size: 2M count: 32 net: userLevelNetworking: true numa: topologyPolicy: single-numa-node realTimeKernel: enabled: false machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/master: "" nodeSelector: node-role.kubernetes.io/master: "" 3. Deploy the cluster
Actual results:
Cluster deployment fails at bootstrapping stage For SNO clusters, most of the time logs are spamming the following error > journalctl -b -u bootkube.service bootkube.sh[7451]: [#4] failed to create some manifests: bootkube.sh[7451]: "performance_profile_dpdk.yaml": failed to create performanceprofiles.v2.performance.openshift.io/dpdk -n : Internal error occurred: failed calling webhook "vwb.performance.openshift.io": failed to call webhook: Post "https://performance-addon-operator-service.openshift-cluster-node-tuning-operator.svc:443/validate-performance-openshift-io-v2-performanceprofile?timeout=10s": no endpoints available for service "performance-addon-operator-service" For compact clusters (and SNO when it doesn't fail previously) logs are spamming the following error > oc -n openshift-machine-config-operator logs deployment/machine-config-controller -c machine-config-controller I1223 14:10:24.299182 1 kubelet_config_controller.go:491] KubeletConfig performance-dpdk has been deleted W1223 14:10:25.095025 1 kubelet_config_controller.go:462] error updating the kubelet config with annotation key "machineconfiguration.openshift.io/mc-name-suffix" and value "": kubeletconfig.machineconfiguration.openshift.io "performance-dpdk" not found W1223 14:10:25.095050 1 kubelet_config_controller.go:429] error updating kubeletconfig status: kubeletconfig.machineconfiguration.openshift.io "performance-dpdk" not found I1223 14:10:25.095060 1 kubelet_config_controller.go:332] Error syncing kubeletconfig performance-dpdk: kubeletconfig.machineconfiguration.openshift.io "performance-dpdk" not found I1223 14:10:25.133332 1 node_controller.go:1035] No nodes available for updates I1223 14:10:25.133603 1 status.go:224] Degraded Machine: cnvqe-08.lab.eng.tlv2.redhat.com and Degraded Reason: machineconfig.machineconfiguration.openshift.io "rendered-master-82d8570749169c031983cc3e9151d030" not found
Additional info:
It seems simply creating a Tuned resource at Day 0 is also failing for SNO and compact clusters --- kind: Tuned apiVersion: tuned.openshift.io/v1 metadata: name: hugepages namespace: openshift-cluster-node-tuning-operator spec: profile: - name: openshift-node-hugepages data: | [main] summary=Boot time configuration for hugepages include=openshift-node [bootloader] cmdline_openshift_node_hugepages=default_hugepagesz=2M hugepages=32 recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: "master" priority: 25 profile: openshift-node-hugepages > oc -n openshift-machine-config-operator logs deployment/machine-config-controller -c machine-config-controller I1222 21:35:08.908410 1 status.go:224] Degraded Machine: cnvqe-03.lab.eng.tlv2.redhat.com and Degraded Reason: machineconfig.machineconfiguration.openshift.io "rendered-master-f3b3143b5d67b2efcb405cb1051662a4" not found > oc -n openshift-machine-config-operator logs daemonset/machine-config-daemon -c machine-config-daemon I1222 21:26:28.144081 15114 node.go:52] Setting initial node config: rendered-master-f3b3143b5d67b2efcb405cb1051662a4 I1222 21:26:28.152814 15114 daemon.go:1495] In bootstrap mode E1222 21:26:28.152954 15114 writer.go:226] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-f3b3143b5d67b2efcb405cb1051662a4" not found
- duplicates
-
OCPBUGS-22095 PerformanceProfile render fails at Day-0 because the master/worker pools are not yet present
- Closed
- is related to
-
OCPBUGS-25300 OCP SNO RAN DU deployment has additional reboot
- Closed