-
Spike
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
None
-
False
-
-
False
-
None
-
None
-
None
-
None
Which 4.y.z to 4.y'.z' updates increase vulnerability?
- All clusters upgrading to versions between 4.18.0 and 4.18.12
- And, All clusters running 4.18.0-4.18.12
Which types of clusters?
- Root cause is that restorecon of /var/lib/kubelet takes longer than the 90s timeout, however it's not clear what conditions trigger relabeling of that path to take longer than 90s. Likely either a high number of mounted volumes or less performant storage for the filesystem where /var/lib/kubelet resides
What is the impact? Is it serious enough to warrant removing update recommendations?
- If a cluster hits this bug, it renders the node not ready without remediation
How involved is remediation?
- Try to manually relabel /var/lib/kubelet by running `/usr/sbin/restorecon -rv /var/lib/kubelet/ /usr/local/bin/kubenswrapper /usr/bin/kubensenter` interactively to avoid the timeout. However even if the path is relabeled successfully it may still present a problem.
- Create a systemd dropin at /etc/systemd/system/kubelet.service.d/99-narrow-restorecon.conf with the following content
[Service]
ExecStartPre=
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=-/usr/sbin/restorecon -ri /var/lib/kubelet/pod-resources /usr/local/bin/kubenswrapper /usr/bin/kubensenter
- This could be done via MachineConfig prior to upgrading to 4.18.
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 99-worker-narrow-restorecon
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,W1NlcnZpY2VdCkV4ZWNTdGFydFByZT0KRXhlY1N0YXJ0UHJlPS9iaW4vbWtkaXIgLS1wYXJlbnRzIC9ldGMva3ViZXJuZXRlcy9tYW5pZmVzdHMKRXhlY1N0YXJ0UHJlPS0vdXNyL3NiaW4vcmVzdG9yZWNvbiAtcmkgL3Zhci9saWIva3ViZWxldC9wb2QtcmVzb3VyY2VzIC91c3IvbG9jYWwvYmluL2t1YmVuc3dyYXBwZXIgL3Vzci9iaW4va3ViZW5zZW50ZXIK
mode: 0640
overwrite: true
path: /etc/systemd/system/kubelet.service.d/99-narrow-restorecon.conf
osImageURL: ""
Is this a regression?
- Yes
- blocks
-
OCPBUGS-54384 Restorecon failure in OCP 4.18, causing kubelet to not start
-
- Closed
-
- links to