Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17162

corrupted image preventing pod from running

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      We've seen pods failing to start with errors like:
      
      exec /usr/local/bin/cephcsi: no such file or directory
      
      or
      
      exec /usr/local/bin/cephcsi: no such file or directory
      

      Version-Release number of selected component (if applicable):

      
      

      How reproducible:

      This seems to be happening on about 20% of new nodes on this cluster (with autoscaling)
      

      Steps to Reproduce:

      1. Create a new node
      2. Get unlucky :(
      

      Actual results:

      The pod fails to start
      

      Expected results:

      The pod starts
      

      Additional info:

      This seems to be related to https://access.redhat.com/solutions/5972661
      
      Our suspicion is that the since cephfsplugin runs as a daemonset, it is starting on new nodes very early and while the image is being pulled, the machine config operator restarts the node causing corruption to the image layers that were in the middle of being pulled down
      

            pehunt@redhat.com Peter Hunt
            achvatal.openshift Alex Chvatal
            Sunil Choudhary Sunil Choudhary
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: