Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-631

machineconfig service is failed to start because Podman storage gets corrupted

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • 4.13.0
    • 4.11.z
    • Containers
    • Important
    • Hide

      Our 4.11.2 Openshift on Openstack CI jobs failed due to this issue. It is affecting different installation types (IPI, UPI, IPI proxy...) and NetworkTypes (OpenShiftSDN, OVNKubernetes, Kuryr). The failure we are observing is bootstrap control plane failure in some cases. In other cases, not all the workers are deployed. The chances to get a successful installation is pretty low (2 successful installations from 15 attempts).

      Furthermore, there is no clear workaround to fix this issue.

      Show
      Our 4.11.2 Openshift on Openstack CI jobs failed due to this issue. It is affecting different installation types (IPI, UPI, IPI proxy...) and NetworkTypes (OpenShiftSDN, OVNKubernetes, Kuryr). The failure we are observing is bootstrap control plane failure in some cases. In other cases, not all the workers are deployed. The chances to get a successful installation is pretty low (2 successful installations from 15 attempts). Furthermore, there is no clear workaround to fix this issue.

    Description

      Description of problem:

       

      During ocp multinode spoke cluster creation agent provisioning is stuck on "configuring" because machineConfig service is crashing on the node.
      After restarting the service still fails with 

      Can't read link "/var/lib/containers/storage/overlay/l/V2OP2CCVMKSOHK2XICC546DUCG" because it does not exist. A storage corruption might have occurred, attempting to recreate the missing symlinks. It might be best wipe the storage to avoid further errors due to storage corruption. 

      Version-Release number of selected component (if applicable):

      Podman 4.0.2 + 

      How reproducible:

      sometimes

      Steps to Reproduce:

      1. deploy multinode spoke (ipxe + boot order )
      2.
      3.
      

      Actual results:

      4 agents in done state and 1 is in "configuring"

       

      Expected results:

      all agents are in "done" state

      Additional info:

      issue mentioned in https://github.com/containers/podman/issues/14003

       

      Fix: https://github.com/containers/storage/issues/1136

       

       

       

      Attachments

        Issue Links

          Activity

            Public project attachment banner

              context keys: [headless, issue, helper, isAsynchronousRequest, project, action, user]
              current Project key: OCPBUGS

              People

                pehunt@redhat.com Peter Hunt
                vkolodny@redhat.com Vladislav Kolodny
                Sunil Choudhary Sunil Choudhary
                Votes:
                0 Vote for this issue
                Watchers:
                22 Start watching this issue

                Dates

                  Created:
                  Updated: