-
Bug
-
Resolution: Done-Errata
-
Major
-
4.14
-
Important
-
No
-
OCPNODE Sprint 242 (Blue)
-
1
-
Rejected
-
False
-
-
-
Known Issue
-
Done
-
Release Notes
-
-
11/20: tbd, best case 4.14.4 now
-
Description of problem:
In baremetal multinode OCP cluster a node ends up in NotReady state. On the node there are couple of failed services: ● cpuset-configure.service loaded failed failed Move services to reserved cpuset ● on-prem-resolv-prepender.service loaded failed failed Populates resolv.conf according to on-prem IPI needs journalctl --boot --no-pager -u cpuset-configure.service Sep 18 16:57:37 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: Starting Move services to reserved cpuset... Sep 18 16:57:37 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com cpuset-configure.sh[3014]: /usr/local/bin/cpuset-configure.sh: line 17: /sys/fs/cgroup/cpuset/cpuset.sched_load_balance: Read-only file system Sep 18 16:57:38 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: cpuset-configure.service: Main process exited, code=exited, status=1/FAILURE Sep 18 16:57:38 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: cpuset-configure.service: Failed with result 'exit-code'. Sep 18 16:57:38 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: Failed to start Move services to reserved cpuset. Sep 18 16:57:52 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: Failed to start Populates resolv.conf according to on-prem IPI needs. Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: Starting Populates resolv.conf according to on-prem IPI needs... Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4852]: nameserver 10.47.242.10 Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4851]: NM resolv-prepender: Starting download of baremetal runtime cfg image Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:23012b3380ffce706aa8f204cdc26745d8a69b0218150ec3bcb495202694fdab... Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Getting image source signatures Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:916ead524b9e54b9d5534b65534253c02ce66f1d784e683389aa3c4cb4d12389 Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:d8190195889efb5333eeec18af9b6c82313edd4db62989bd3a357caca4f13f0e Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:c71d2589fba7989ecd29ea120fe7add01fab70126fc653a863d5844e35ee5403 Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:97da74cc6d8fa5d1634eb1760fd1da5c6048619c264c23e62d75f3bf6b8ef5c4 Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying blob sha256:d4dc6e74b6ce09e24dc284cc1967451f3dda2d485bc92fc95d24d91f939e4849 Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Copying config sha256:ba2c86ef11c4e341cd0870b6d5b7ad39aa39724389d9d2dfead4ea3d75582071 Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Writing manifest to image destination Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: Storing signatures Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4854]: ba2c86ef11c4e341cd0870b6d5b7ad39aa39724389d9d2dfead4ea3d75582071 Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4851]: NM resolv-prepender: Download of baremetal runtime cfg image completed Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4863]: Your kernel does not support pids limit capabilities or the cgroup is not mounted. PIDs limit discarded. Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com resolv-prepender.sh[4863]: Error: OCI runtime error: runc: runc create failed: mountpoint for devices not found Sep 18 16:57:53 openshift-worker-3.ecore.lab.eng.tlv2.redhat.com systemd[1]: on-prem-resolv-prepender.service: Main process exited, code=exited, status=127/n/a When checking CGroup config:
oc describe node.config Name: cluster Namespace: Labels: <none> Annotations: include.release.openshift.io/ibm-cloud-managed: true include.release.openshift.io/self-managed-high-availability: true include.release.openshift.io/single-node-developer: true release.openshift.io/create-only: true API Version: config.openshift.io/v1 Kind: Node Metadata: Creation Timestamp: 2023-09-18T15:27:44Z Generation: 3 Owner References: API Version: config.openshift.io/v1 Kind: ClusterVersion Name: version UID: c62da215-6526-4306-8fc6-035612c8605e Resource Version: 91518 UID: cf2189ba-cd69-45e9-868c-7c2589decb25 Spec: Cgroup Mode: v1 Events: <none>
Version-Release number of selected component (if applicable):
4.14.0-rc.1
How reproducible:
so far 100%
Steps to Reproduce:
1. Deploy baremetal multinode cluster with GitOps-ZTP workflow 2. 3.
Actual results:
While all policies report Complaint state some configs are still being applied: oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE ht100gb rendered-ht100gb-572f5aef443a21b21a8c5cfe816708e2 False True False 2 0 0 0 77m master rendered-master-3c44ec28c389693028ad2cc6b74741ca True False False 3 3 3 0 103m standard rendered-standard-1942568110455a377b735e15f18c7ba8 True False False 2 2 2 0 77m worker rendered-worker-033d4f0a2568efce241d02a2c54ab88e True False False 0 0 0 0 103m
Expected results:
All nodes are in Ready state
Additional info:
- blocks
-
OCPBUGS-26072 Node in NotReady state as unified_cgroup_hierarchy=1 are set
- Closed
- is cloned by
-
OCPBUGS-26072 Node in NotReady state as unified_cgroup_hierarchy=1 are set
- Closed
- relates to
-
OCPBUGS-18640 Cluster fails to install at day-0 with PerformanceProfile
- Closed
-
OCPBUGS-25300 OCP SNO RAN DU deployment has additional reboot
- Closed
- links to
-
RHSA-2023:7198 OpenShift Container Platform 4.15 security update
(2 links to)