Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-737

machineconfig service is failed to start because Podman storage gets corrupted

XMLWordPrintable

    • Important
    • None
    • Rejected
    • False
    • Hide

      Our 4.11.2 Openshift on Openstack CI jobs failed due to this issue. It is affecting different installation types (IPI, UPI, IPI proxy...) and NetworkTypes (OpenShiftSDN, OVNKubernetes, Kuryr). The failure we are observing is bootstrap control plane failure in some cases. In other cases, not all the workers are deployed. The chances to get a successful installation is pretty low (2 successful installations from 15 attempts).

      Furthermore, there is no clear workaround to fix this issue.

      Show
      Our 4.11.2 Openshift on Openstack CI jobs failed due to this issue. It is affecting different installation types (IPI, UPI, IPI proxy...) and NetworkTypes (OpenShiftSDN, OVNKubernetes, Kuryr). The failure we are observing is bootstrap control plane failure in some cases. In other cases, not all the workers are deployed. The chances to get a successful installation is pretty low (2 successful installations from 15 attempts). Furthermore, there is no clear workaround to fix this issue.

      Description of problem:

       

      During ocp multinode spoke cluster creation agent provisioning is stuck on "configuring" because machineConfig service is crashing on the node.
      After restarting the service still fails with 

      Can't read link "/var/lib/containers/storage/overlay/l/V2OP2CCVMKSOHK2XICC546DUCG" because it does not exist. A storage corruption might have occurred, attempting to recreate the missing symlinks. It might be best wipe the storage to avoid further errors due to storage corruption. 

      Version-Release number of selected component (if applicable):

      Podman 4.0.2 + 

      How reproducible:

      sometimes

      Steps to Reproduce:

      1. deploy multinode spoke (ipxe + boot order )
      2.
      3.
      

      Actual results:

      4 agents in done state and 1 is in "configuring"

       

      Expected results:

      all agents are in "done" state

      Additional info:

      issue mentioned in https://github.com/containers/podman/issues/14003

       

      Fix: https://github.com/containers/storage/issues/1136

       

       

       

            pehunt@redhat.com Peter Hunt
            vkolodny@redhat.com Vladislav Kolodny
            Sunil Choudhary Sunil Choudhary
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: