Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63309

The /boot filesystem fills up on a cluster upgrade for a cluster deployed with CAPI+M3

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.19
    • RHCOS
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • x86_64
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      We have a SNO Hub cluster deployed using Assisted Installer.  On this cluster, we installed the MCE operator and enabled CAPI/M3. As part of this, we were instructed the image used for deploying spoke clusters must include the cloud-init to support a config drive so we grabbed the RHCOS Openstack image to use as the base image.  The deployment worked fine and we successfully deployed a 4.19.10.  Post deployment, we attempted to upgrade from 4.19.10 and the deployment failed with the machine-config operator failing with the error message:
      
      Unable to apply 4.19.13: error during syncRequiredMachineConfigPools: [context deadline exceeded, error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1, reason: Node node1.example.com is reporting: "unexpected on-disk state validating against rendered-master-7342b6e6f4da65354b75fb4695d9c0e0: expected target osImageURL \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cdc2f0b00851e31b0f34ea7601b8550ad143c1f21aab6fd70b6082cf5f4076fe\", have \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:deb731c3ffed587df534e53a83420ac00540677259d79d84d66c2b1c4422041b\"; possible root cause: error: Installing kernel: regfile copy: No space left on device")]
      
      Inspecting the system showed the /boot/ostree had three directories in it, 2 install-* and one rhcos-* directory.
      
      We remounted /boot as read/write and deleted /boot/ostree/rhcos-*, touched /run/machine-config-daemon-force and the node rebooted and the upgrade proceeded. We had to do this on every baremetal node in the spoke cluster.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1. Deploy Spoke cluster using our documentation:
      https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/machine_management/managing-machines-with-the-cluster-api
          2. Upgrade Spoke Cluster
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

              Unassigned Unassigned
              darin.sorrentino Darin Sorrentino
              None
              None
              Jad Haj Yahya Jad Haj Yahya
              None
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: