Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7605

Node reports OutOfDisk=Unknown after layered OS update

    XMLWordPrintable

Details

    • No
    • MCO Sprint 232
    • 1
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      While performing a layered OS upgrade (either via an OpenShift e2e upgrade test) or by overriding the osImageURL field in a MachineConfig, numerous errors appear within the Machine Config Controller which resemble the following:
      
      I0216 15:11:38.328052       1 node_controller.go:446] Pool infra[zone=us-east-1a]: node ip-10-0-134-120.ec2.internal: Reporting unready: node ip-10-0-134-120.ec2.internal is reporting OutOfDisk=Unknown

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Always.

      Steps to Reproduce:

      1. Create an OpenShift 4.12 or 4.13 cluster.
      2. Either run the OpenShift e2e upgrade tests -or- create a MachineConfig that overrides osImageURL with a custom OS image.
      3. Watch in the Machine Config Controller logs while the node is updating.
      

      Actual results:

      Eventually, you'll see a log entry that resembles:
      
      I0216 15:07:08.852067       1 event.go:285] Event(v1.ObjectReference{Kind:"MachineConfig", Namespace:"", Name:"rendered-infra-3c7916178f0d7ae4209b1aba41a33b79", UID:"", APIVersion:"machineconfiguration.openshift.io/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OSImageURLOverridden' OSImageURL was overridden via machineconfig in rendered-infra-3c7916178f0d7ae4209b1aba41a33b79 (was: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fc9c9ccd5d76269ffff5672f4751ac5b390d759c9ad02dcf141ef5c6cce4a713 is: quay.io/zzlotnik/testing:4.12-8.6)
      I0216 15:11:38.328052       1 node_controller.go:446] Pool infra[zone=us-east-1a]: node ip-10-0-134-120.ec2.internal: Reporting unready: node ip-10-0-134-120.ec2.internal is reporting OutOfDisk=Unknown
      I0216 15:11:38.356883       1 node_controller.go:446] Pool infra[zone=us-east-1a]: node ip-10-0-134-120.ec2.internal: changed taints
      I0216 15:11:42.082709       1 node_controller.go:446] Pool infra[zone=us-east-1a]: node ip-10-0-134-120.ec2.internal: Reporting unready: node ip-10-0-134-120.ec2.internal is reporting Unschedulable
      I0216 15:11:42.104540       1 node_controller.go:446] Pool infra[zone=us-east-1a]: node ip-10-0-134-120.ec2.internal: changed taints
      
      However, this eventually clears and the node returns to service. Looking at the disk on the node, disk usage looks fine:
      
      $ sh-4.4# df -h | grep -v "container" | grep -v "kubelet"
      Filesystem      Size  Used Avail Use% Mounted on
      /dev/nvme0n1p4  120G  8.9G  111G   8% /
      tmpfs           7.7G     0  7.7G   0% /sys/fs/cgroup
      devtmpfs        7.7G     0  7.7G   0% /dev
      tmpfs           7.7G     0  7.7G   0% /dev/shm
      tmpfs           7.7G   48M  7.7G   1% /run
      tmpfs           7.7G   12K  7.7G   1% /tmp
      /dev/nvme0n1p3  350M  104M  224M  32% /boot

      Expected results:

      I would not have expected to see the OutOfDisk=Unknown indication.

      Additional info:

      Attachments

        Issue Links

          Activity

            People

              djoshy David Joshy
              zzlotnik@redhat.com Zack Zlotnik
              Rio Liu Rio Liu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: