-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
4.13.0
-
None
-
Critical
-
Yes
-
Proposed
-
False
-
Description of problem:
The device sda should be the OS disk and sdb should be the additional disk for /var partition. But the problem master node seems not using sda, instead putting everything on sdb and leads to "The root filesystem is too small".
Version-Release number of selected component (if applicable):
4.13.0-0.nightly-2023-03-23-204038
How reproducible:
Always
Steps to Reproduce:
1. normal UPI installation, but along with configuring additional disks for /var partition
Actual results:
One master node got the issue of "The root filesystem is too small", so that installation failed.
Expected results:
Installation should succeed.
Additional info:
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 39m Unable to apply 4.13.0-0.nightly-2023-03-23-204038: some cluster operators are not available $ oc get nodes NAME STATUS ROLES AGE VERSION jiwei-0328a-fg99t-master-0.c.openshift-qe.internal Ready control-plane,master 38m v1.26.2+dc93b13 jiwei-0328a-fg99t-master-1.c.openshift-qe.internal Ready control-plane,master 37m v1.26.2+dc93b13 $ oc get machines -A No resources found $ oc get co | grep -v 'True False False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.13.0-0.nightly-2023-03-23-204038 False False True 39m OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.234.161:443/healthz": dial tcp 172.30.234.161:443: connect: connection refused... console 4.13.0-0.nightly-2023-03-23-204038 False False True 32m RouteHealthAvailable: console route is not admitted image-registry False True True 32m Available: The deployment does not have available replicas... ingress False True True 32m The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.) kube-controller-manager 4.13.0-0.nightly-2023-03-23-204038 True False True 35m GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: no such host monitoring False True True 28m reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 2 unavailable replicas network 4.13.0-0.nightly-2023-03-23-204038 True True False 40m Deployment "/openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready $ [core@jiwei-0328a-fg99t-int-svc ~]$ ssh -i .ssh/openshift-qe.pem core@10.0.0.5 Red Hat Enterprise Linux CoreOS 413.92.202303190222-0 Part of OpenShift 4.13, RHCOS is a Kubernetes native operating system managed by the Machine Config Operator (`clusteroperator/machine-config`).WARNING: Direct SSH access to machines is not recommended; instead, make configuration changes via `machineconfig` objects: https://docs.openshift.com/container-platform/4.13/architecture/architecture-rhcos.html---############################################################################ WARNING: The root filesystem is too small. It is strongly recommended to allocate at least 8 GiB of space to allow for upgrades. From June 2021, this condition will trigger a failure in some cases. For more information, see: https://docs.fedoraproject.org/en-US/fedora-coreos/storage/You may delete this warning using: sudo rm /etc/motd.d/60-coreos-rootfs-size.motd ############################################################################Last login: Tue Mar 28 01:50:48 2023 from 10.0.0.2 [core@jiwei-0328a-fg99t-master-2 ~]$ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 4.0M 0 4.0M 0% /dev tmpfs 7.4G 84K 7.4G 1% /dev/shm tmpfs 3.0G 46M 2.9G 2% /run /dev/sdb4 3.0G 2.8G 112M 97% /sysroot tmpfs 7.4G 4.0K 7.4G 1% /tmp /dev/sdb5 125G 1.5G 124G 2% /var /dev/sdb3 350M 103M 225M 32% /boot tmpfs 1.5G 0 1.5G 0% /run/user/1000 [core@jiwei-0328a-fg99t-master-2 ~]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 128G 0 disk sdb 8:16 0 128G 0 disk ├─sdb1 8:17 0 1M 0 part ├─sdb2 8:18 0 127M 0 part ├─sdb3 8:19 0 384M 0 part /boot ├─sdb4 8:20 0 3G 0 part /sysroot/ostree/deploy/rhcos/var │ /usr │ /etc │ / │ /sysroot └─sdb5 8:21 0 124.5G 0 part /var [core@jiwei-0328a-fg99t-master-2 ~]$ sudo crictl ps FATA[0000] unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/crio/crio.sock: connect: no such file or directory" [core@jiwei-0328a-fg99t-master-2 ~]$ sudo crictl img FATA[0000] unable to determine image API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/crio/crio.sock: connect: no such file or directory" [core@jiwei-0328a-fg99t-master-2 ~]$ [core@jiwei-0328a-fg99t-master-0 ~]$ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 4.0M 0 4.0M 0% /dev tmpfs 7.4G 0 7.4G 0% /dev/shm tmpfs 3.0G 62M 2.9G 3% /run tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup /dev/sda4 128G 3.1G 125G 3% /sysroot tmpfs 7.4G 40K 7.4G 1% /tmp /dev/sdb1 128G 12G 117G 9% /var /dev/sda3 350M 103M 225M 32% /boot tmpfs 1.5G 0 1.5G 0% /run/user/1000 [core@jiwei-0328a-fg99t-master-0 ~]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 128G 0 disk ├─sda1 8:1 0 1M 0 part ├─sda2 8:2 0 127M 0 part ├─sda3 8:3 0 384M 0 part /boot └─sda4 8:4 0 127.5G 0 part /var/lib/kubelet/pods/d4b39dc4-6f59-46f4-9382-ba5fd230a1e8/volume-subpaths/etc/tuned/5 /var/lib/kubelet/pods/d4b39dc4-6f59-46f4-9382-ba5fd230a1e8/volume-subpaths/etc/tuned/4 /var/lib/kubelet/pods/d4b39dc4-6f59-46f4-9382-ba5fd230a1e8/volume-subpaths/etc/tuned/3 /var/lib/kubelet/pods/d4b39dc4-6f59-46f4-9382-ba5fd230a1e8/volume-subpaths/etc/tuned/2 /var/lib/kubelet/pods/d4b39dc4-6f59-46f4-9382-ba5fd230a1e8/volume-subpaths/etc/tuned/1 /sysroot/ostree/deploy/rhcos/var /usr /etc / /sysroot sdb 8:16 0 128G 0 disk └─sdb1 8:17 0 128G 0 part /var/lib/containers/storage/overlay /var [core@jiwei-0328a-fg99t-master-0 ~]$ $ gcloud compute instances list --filter='name~jiwei-0328a' NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS jiwei-0328a-fg99t-bootstrap us-central1-a n1-standard-4 10.0.0.4 34.170.88.5 RUNNING jiwei-0328a-fg99t-master-0 us-central1-a n1-standard-4 10.0.0.6 RUNNING jiwei-0328a-fg99t-int-svc us-central1-b n2-standard-2 10.0.0.2 104.198.16.228 RUNNING jiwei-0328a-fg99t-master-1 us-central1-b n1-standard-4 10.0.0.7 RUNNING jiwei-0328a-fg99t-master-2 us-central1-c n1-standard-4 10.0.0.5 RUNNING $ gcloud compute disks list --filter='name~jiwei-0328a' NAME LOCATION LOCATION_SCOPE SIZE_GB TYPE STATUS jiwei-0328a-fg99t-bootstrap us-central1-a zone 128 pd-standard READY jiwei-0328a-fg99t-master-0 us-central1-a zone 128 pd-ssd READY jiwei-0328a-fg99t-master-0-1 us-central1-a zone 128 pd-ssd READY jiwei-0328a-fg99t-int-svc us-central1-b zone 200 pd-standard READY jiwei-0328a-fg99t-master-1 us-central1-b zone 128 pd-ssd READY jiwei-0328a-fg99t-master-1-1 us-central1-b zone 128 pd-ssd READY jiwei-0328a-fg99t-master-2 us-central1-c zone 128 pd-ssd READY jiwei-0328a-fg99t-master-2-1 us-central1-c zone 128 pd-ssd READY $
- relates to
-
OCPBUGS-11978 Add note about device name that should not be sda or sdb
- ON_QA