Details
-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.12
-
Moderate
-
OCPNODE Sprint 229 (Blue)
-
1
-
False
-
-
Description
Description of problem:
One step during backup/restore procedure is to restart the kubelet service. This step failed on baremetal SNO node [core@sno-0 ~]$ sudo systemctl restart kubelet.service Job for kubelet.service failed because a timeout was exceeded. See "systemctl status kubelet.service" and "journalctl -xe" for details. [core@sno-0 ~]$ systemctl status kubelet ● kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/kubelet.service.d └─01-kubens.conf, 10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf, 30-kubelet-interval-tuning.conf, 90-container-mount-namespace.conf Active: activating (start) since Thu 2022-12-15 10:02:13 UTC; 56s ago Process: 77444 ExecStartPre=/usr/local/bin/extractExecStart kubelet.service //run/kubelet-execstart.env ORIG_EXECSTART (code=exited, status=0/SUCCESS) Process: 77442 ExecStartPre=/bin/rm -f /var/lib/kubelet/memory_manager_state (code=exited, status=0/SUCCESS) Process: 77440 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS) Process: 77438 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS) Main PID: 77450 (kubelet) Tasks: 13 (limit: 1317948) Memory: 23.3M CPU: 105ms CGroup: /system.slice/kubelet.service └─77450 /usr/bin/kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=>Dec 15 10:02:13 sno-0.kni-qe-34.lab.eng.rdu2.redhat.com bash[77450]: I1215 10:02:13.713861 77450 certificate_manager.go:270] kubernetes.io/kube-apiserver-client-kubelet: Certificate expiration is 2023-01-12 04:38:37 +00> Dec 15 10:02:13 sno-0.kni-qe-34.lab.eng.rdu2.redhat.com bash[77450]: I1215 10:02:13.713889 77450 certificate_manager.go:270] kubernetes.io/kube-apiserver-client-kubelet: Waiting 561h57m57.357021476s for next certificate> Dec 15 10:02:13 sno-0.kni-qe-34.lab.eng.rdu2.redhat.com bash[77450]: I1215 10:02:13.714100 77450 dynamic_cafile_content.go:119] "Loaded a new CA Bundle and Verifier" name="client-ca-bundle::/etc/kubernetes/kubelet-ca.cr> Dec 15 10:02:13 sno-0.kni-qe-34.lab.eng.rdu2.redhat.com bash[77450]: I1215 10:02:13.714203 77450 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/kubelet-ca.crt" Dec 15 10:02:13 sno-0.kni-qe-34.lab.eng.rdu2.redhat.com bash[77450]: I1215 10:02:13.714216 77450 manager.go:163] cAdvisor running in container: "/system.slice/kubelet.service" Dec 15 10:02:13 sno-0.kni-qe-34.lab.eng.rdu2.redhat.com bash[77450]: I1215 10:02:13.721436 77450 fs.go:133] Filesystem UUIDs: map[191E-6008:/dev/sda2 2022-09-30-23-59-12-00:/dev/sr0 899b5a36-ac2e-46be-8cf1-c02d00855e2e:> Dec 15 10:02:13 sno-0.kni-qe-34.lab.eng.rdu2.redhat.com bash[77450]: I1215 10:02:13.721462 77450 fs.go:134] Filesystem partitions: map[/dev/sda3:{mountpoint:/boot major:8 minor:3 fsType:ext4 blockSize:0} /dev/sda4:{moun> Dec 15 10:02:13 sno-0.kni-qe-34.lab.eng.rdu2.redhat.com bash[77450]: -ae06-4449-9c80-35feeecadae6/volumes/kubernetes.io~secret/webhook-serving-cert:{mountpoint:/var/lib/kubelet/pods/60ff4e2d-ae06-4449-9c80-35feeecadae6/vo> Dec 15 10:02:13 sno-0.kni-qe-34.lab.eng.rdu2.redhat.com bash[77450]: 6d5989dd9938304677c92664236c8825dfa718a6c3c12ec57d8a88/merged major:0 minor:1116 fsType:overlay blockSize:0} overlay_0-1118:{mountpoint:/var/lib/contain> Dec 15 10:02:13 sno-0.kni-qe-34.lab.eng.rdu2.redhat.com bash[77450]: I1215 10:02:13.722567 77450 nvidia.go:54] NVIDIA GPU metrics disabled
Version-Release number of selected component (if applicable):
4.12-rc.4
How reproducible:
so far 1st attempt to perform backup-restore
Steps to Reproduce:
1. Install disconnected SNO cluster 2. Configure 4 reserved CPUs with performance profile: cpu: isolated: 2-31,34-127 reserved: 0-1,32-33 3. Install operators - LVMO, SR-IOV, SR-IOV-FEC 4. Create ETCD backup 5. Start restore procedure - https://access.redhat.com/documentation/en-us/openshift_container_platform/4.11/html-single/backup_and_restore/index#dr-restoring-cluster-state
Actual results:
kubelet restart failed(timed out)
Expected results:
kubelet is successfully restarted
Additional info: