Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.13.0
Component/s: RHCOS
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Critical
Regression:
Yes

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Proposed
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

The device sda should be the OS disk and sdb should be the additional disk for /var partition. But the problem master node seems not using sda, instead putting everything on sdb and leads to "The root filesystem is too small".

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-23-204038

How reproducible:

Always

Steps to Reproduce:

1. normal UPI installation, but along with configuring additional disks for /var partition

Actual results:

One master node got the issue of "The root filesystem is too small", so that installation failed.

Expected results:

Installation should succeed.

Additional info:

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          39m     Unable to apply 4.13.0-0.nightly-2023-03-23-204038: some cluster operators are not available
$ oc get nodes
NAME                                                 STATUS   ROLES                  AGE   VERSION
jiwei-0328a-fg99t-master-0.c.openshift-qe.internal   Ready    control-plane,master   38m   v1.26.2+dc93b13
jiwei-0328a-fg99t-master-1.c.openshift-qe.internal   Ready    control-plane,master   37m   v1.26.2+dc93b13
$ oc get machines -A
No resources found
$ oc get co | grep -v 'True        False         False'
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-0.nightly-2023-03-23-204038   False       False         True       39m     OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.234.161:443/healthz": dial tcp 172.30.234.161:443: connect: connection refused...
console                                    4.13.0-0.nightly-2023-03-23-204038   False       False         True       32m     RouteHealthAvailable: console route is not admitted
image-registry                                                                  False       True          True       32m     Available: The deployment does not have available replicas...
ingress                                                                         False       True          True       32m     The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)
kube-controller-manager                    4.13.0-0.nightly-2023-03-23-204038   True        False         True       35m     GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: no such host
monitoring                                                                      False       True          True       28m     reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 2 unavailable replicas
network                                    4.13.0-0.nightly-2023-03-23-204038   True        True          False      40m     Deployment "/openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready
$ 

[core@jiwei-0328a-fg99t-int-svc ~]$ ssh -i .ssh/openshift-qe.pem core@10.0.0.5
Red Hat Enterprise Linux CoreOS 413.92.202303190222-0
  Part of OpenShift 4.13, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.13/architecture/architecture-rhcos.html---############################################################################
WARNING: The root filesystem is too small. It is strongly recommended to
allocate at least 8 GiB of space to allow for upgrades. From June 2021, this
condition will trigger a failure in some cases. For more information, see:
https://docs.fedoraproject.org/en-US/fedora-coreos/storage/You may delete this warning using:
sudo rm /etc/motd.d/60-coreos-rootfs-size.motd
############################################################################Last login: Tue Mar 28 01:50:48 2023 from 10.0.0.2
[core@jiwei-0328a-fg99t-master-2 ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        4.0M     0  4.0M   0% /dev
tmpfs           7.4G   84K  7.4G   1% /dev/shm
tmpfs           3.0G   46M  2.9G   2% /run
/dev/sdb4       3.0G  2.8G  112M  97% /sysroot
tmpfs           7.4G  4.0K  7.4G   1% /tmp
/dev/sdb5       125G  1.5G  124G   2% /var
/dev/sdb3       350M  103M  225M  32% /boot
tmpfs           1.5G     0  1.5G   0% /run/user/1000
[core@jiwei-0328a-fg99t-master-2 ~]$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    0   128G  0 disk 
sdb      8:16   0   128G  0 disk 
├─sdb1   8:17   0     1M  0 part 
├─sdb2   8:18   0   127M  0 part 
├─sdb3   8:19   0   384M  0 part /boot
├─sdb4   8:20   0     3G  0 part /sysroot/ostree/deploy/rhcos/var
│                                /usr
│                                /etc
│                                /
│                                /sysroot
└─sdb5   8:21   0 124.5G  0 part /var
[core@jiwei-0328a-fg99t-master-2 ~]$ sudo crictl ps
FATA[0000] unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/crio/crio.sock: connect: no such file or directory" 
[core@jiwei-0328a-fg99t-master-2 ~]$ sudo crictl img
FATA[0000] unable to determine image API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/crio/crio.sock: connect: no such file or directory" 
[core@jiwei-0328a-fg99t-master-2 ~]$ 

[core@jiwei-0328a-fg99t-master-0 ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        4.0M     0  4.0M   0% /dev
tmpfs           7.4G     0  7.4G   0% /dev/shm
tmpfs           3.0G   62M  2.9G   3% /run
tmpfs           4.0M     0  4.0M   0% /sys/fs/cgroup
/dev/sda4       128G  3.1G  125G   3% /sysroot
tmpfs           7.4G   40K  7.4G   1% /tmp
/dev/sdb1       128G   12G  117G   9% /var
/dev/sda3       350M  103M  225M  32% /boot
tmpfs           1.5G     0  1.5G   0% /run/user/1000
[core@jiwei-0328a-fg99t-master-0 ~]$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    0   128G  0 disk 
├─sda1   8:1    0     1M  0 part 
├─sda2   8:2    0   127M  0 part 
├─sda3   8:3    0   384M  0 part /boot
└─sda4   8:4    0 127.5G  0 part /var/lib/kubelet/pods/d4b39dc4-6f59-46f4-9382-ba5fd230a1e8/volume-subpaths/etc/tuned/5
                                 /var/lib/kubelet/pods/d4b39dc4-6f59-46f4-9382-ba5fd230a1e8/volume-subpaths/etc/tuned/4
                                 /var/lib/kubelet/pods/d4b39dc4-6f59-46f4-9382-ba5fd230a1e8/volume-subpaths/etc/tuned/3
                                 /var/lib/kubelet/pods/d4b39dc4-6f59-46f4-9382-ba5fd230a1e8/volume-subpaths/etc/tuned/2
                                 /var/lib/kubelet/pods/d4b39dc4-6f59-46f4-9382-ba5fd230a1e8/volume-subpaths/etc/tuned/1
                                 /sysroot/ostree/deploy/rhcos/var
                                 /usr
                                 /etc
                                 /
                                 /sysroot
sdb      8:16   0   128G  0 disk 
└─sdb1   8:17   0   128G  0 part /var/lib/containers/storage/overlay
                                 /var
[core@jiwei-0328a-fg99t-master-0 ~]$ 

$ gcloud compute instances list --filter='name~jiwei-0328a'
NAME                         ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP     STATUS
jiwei-0328a-fg99t-bootstrap  us-central1-a  n1-standard-4               10.0.0.4     34.170.88.5     RUNNING
jiwei-0328a-fg99t-master-0   us-central1-a  n1-standard-4               10.0.0.6                     RUNNING
jiwei-0328a-fg99t-int-svc    us-central1-b  n2-standard-2               10.0.0.2     104.198.16.228  RUNNING
jiwei-0328a-fg99t-master-1   us-central1-b  n1-standard-4               10.0.0.7                     RUNNING
jiwei-0328a-fg99t-master-2   us-central1-c  n1-standard-4               10.0.0.5                     RUNNING
$ gcloud compute disks list --filter='name~jiwei-0328a'
NAME                          LOCATION       LOCATION_SCOPE  SIZE_GB  TYPE         STATUS
jiwei-0328a-fg99t-bootstrap   us-central1-a  zone            128      pd-standard  READY
jiwei-0328a-fg99t-master-0    us-central1-a  zone            128      pd-ssd       READY
jiwei-0328a-fg99t-master-0-1  us-central1-a  zone            128      pd-ssd       READY
jiwei-0328a-fg99t-int-svc     us-central1-b  zone            200      pd-standard  READY
jiwei-0328a-fg99t-master-1    us-central1-b  zone            128      pd-ssd       READY
jiwei-0328a-fg99t-master-1-1  us-central1-b  zone            128      pd-ssd       READY
jiwei-0328a-fg99t-master-2    us-central1-c  zone            128      pd-ssd       READY
jiwei-0328a-fg99t-master-2-1  us-central1-c  zone            128      pd-ssd       READY
$

relates to

OCPBUGS-11978 Add note about device name that should not be sda or sdb

Closed

Assignee:: Unassigned

Reporter:: Jianli Wei

Need Info From:: None

Contributors:: None

QA Contact:: Jianli Wei

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2023/03/28 6:00 AM

Updated:: 2025/07/27 11:37 AM

Resolved:: 2023/04/14 4:42 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates