-
Bug
-
Resolution: Done-Errata
-
Major
-
4.14
-
None
-
No
-
CNF Compute Sprint 251
-
1
-
False
-
-
* Currently, applying a performance profile at day-0 is not supported.
-
Known Issue
-
Done
-
-
Description of problem:
Picked up 4.14-ec-4 (which uses cgroups v1 as default) and trying to create a cluster with following PerformanceProfile (and corresponding mcp) by placing them in the manifests folder,
apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: clusterbotpp spec: cpu: isolated: "1-3" reserved: "0" realTimeKernel: enabled: false nodeSelector: node-role.kubernetes.io/worker: "" machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/worker: ""
and,
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: worker spec: machineConfigSelector: matchLabels: machineconfiguration.openshift.io/role: worker nodeSelector: matchLabels: node-role.kubernetes.io/worker: ""
The cluster often fails to install because bootkube spends a lot of time chasing this error,
Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Created "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Failed to update status for the "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n : Operation cannot be fulfilled on kubeletconfigs.machineconfiguration.openshift.io "performance-clusterbotpp": StorageError: invalid object, Code: 4, Key: /kubernetes.io/machineconfiguration.openshift.io/kubeletconfigs/performance-clusterbotpp, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 11f98d74-af1b-4a4c-9692-6dce56ee5cd9, UID in object meta: Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: [#1717] failed to create some manifests: Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: "clusterbotpp_kubeletconfig.yaml": failed to update status for kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n : Operation cannot be fulfilled on kubeletconfigs.machineconfiguration.openshift.io "performance-clusterbotpp": StorageError: invalid object, Code: 4, Key: /kubernetes.io/machineconfiguration.openshift.io/kubeletconfigs/performance-clusterbotpp, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 11f98d74-af1b-4a4c-9692-6dce56ee5cd9, UID in object meta: Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Created "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n Sep 06 18:32:43 ip-10-0-145-107 bootkube.sh[4925]: Failed to update status for the "clusterbotpp_kubeletconfig.yaml" kubeletconfigs.v1.machineconfiguration.openshift.io/performance-clusterbotpp -n : Operation cannot be fulfilled on kubeletconfigs.machineconfiguration.openshift.io "performance-clusterbotpp": StorageError: invalid object, Code: 4, Key: /kubernetes.io/machineconfiguration.openshift.io/kubeletconfigs/performance-clusterbotpp, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 597dfcf3-012d-4730-912a-78efabb920ba, UID in object meta:
This leads to worker nodes not getting ready in time, which leads to installer marking the cluster installation failed. Ironically, even after the cluster installer returns with failure, if you wait long enough (sometimes) I have observed the cluster eventually reconciles and the worker nodes get provisioned.
I am attaching the installation logs from one such run with this issue.
Version-Release number of selected component (if applicable):
4.14
How reproducible:
Often
Steps to Reproduce:
1. Try to install new cluster by placing PeformanceProfile in the manifests folder 2. 3.
Actual results:
Cluster installation failed.
Expected results:
Cluster installation should succeed.
Additional info:
Also, I didn't observe this occurring in 4.13.9.
- blocks
-
OCPBUGS-17859 Avoid extra reboot with cgroups v2 at day-0 for PerformanceProfile
- Closed
- depends on
-
OCPBUGS-29752 day-0 with PerformanceProfile manifest renderer uses invalid uid
- Closed
- is cloned by
-
OCPBUGS-29751 day-0 with PerformanceProfile manifest renderer uses invalid uid
- Closed
-
OCPBUGS-25116 Cluster fails to install at day-0 with PerformanceProfile
- Closed
- is depended on by
-
OCPBUGS-25115 [4.14] Cluster fails to install at day-0 with PerformanceProfile
- Closed
-
OCPBUGS-25116 Cluster fails to install at day-0 with PerformanceProfile
- Closed
- is related to
-
OCPBUGS-19352 Node in NotReady state as unified_cgroup_hierarchy=1 are set
- Closed
- links to
-
RHBA-2024:1458 OpenShift Container Platform 4.14.z bug fix update