Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-631

machineconfig service is failed to start because Podman storage gets corrupted

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Critical
    • 4.13.0
    • 4.11.z
    • Containers
    • Important
    • False
    • Hide

      Our 4.11.2 Openshift on Openstack CI jobs failed due to this issue. It is affecting different installation types (IPI, UPI, IPI proxy...) and NetworkTypes (OpenShiftSDN, OVNKubernetes, Kuryr). The failure we are observing is bootstrap control plane failure in some cases. In other cases, not all the workers are deployed. The chances to get a successful installation is pretty low (2 successful installations from 15 attempts).

      Furthermore, there is no clear workaround to fix this issue.

      Show
      Our 4.11.2 Openshift on Openstack CI jobs failed due to this issue. It is affecting different installation types (IPI, UPI, IPI proxy...) and NetworkTypes (OpenShiftSDN, OVNKubernetes, Kuryr). The failure we are observing is bootstrap control plane failure in some cases. In other cases, not all the workers are deployed. The chances to get a successful installation is pretty low (2 successful installations from 15 attempts). Furthermore, there is no clear workaround to fix this issue.

    Description

      Description of problem:

       

      During ocp multinode spoke cluster creation agent provisioning is stuck on "configuring" because machineConfig service is crashing on the node.
      After restarting the service still fails with 

      Can't read link "/var/lib/containers/storage/overlay/l/V2OP2CCVMKSOHK2XICC546DUCG" because it does not exist. A storage corruption might have occurred, attempting to recreate the missing symlinks. It might be best wipe the storage to avoid further errors due to storage corruption. 

      Version-Release number of selected component (if applicable):

      Podman 4.0.2 + 

      How reproducible:

      sometimes

      Steps to Reproduce:

      1. deploy multinode spoke (ipxe + boot order )
      2.
      3.
      

      Actual results:

      4 agents in done state and 1 is in "configuring"

       

      Expected results:

      all agents are in "done" state

      Additional info:

      issue mentioned in https://github.com/containers/podman/issues/14003

       

      Fix: https://github.com/containers/storage/issues/1136

       

       

       

      Attachments

        Issue Links

          Activity

            People

              pehunt@redhat.com Peter Hunt
              vkolodny@redhat.com Vladislav Kolodny
              Sunil Choudhary Sunil Choudhary
              Votes:
              0 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: