Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1761

osImages that cannot be pulled do not set the node as Degraded properly

    XMLWordPrintable

Details

    Description

      Description of problem:

      When we configure a MC using an osImage that cannot be pulled, the machine config daemon pod spams logs saying that the node is set to "Degraded" state, but the node is not set to "Degraded" state.
      
      Only after long time, like 20 minutes or half and hour, the node eventually becomes degraded.

      Version-Release number of selected component (if applicable):

      4.12.0-0.nightly-2022-09-26-111919

      How reproducible:

      Always

      Steps to Reproduce:

      1. Create a MC using an osImage that cannot be pulled
      
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        creationTimestamp: "2022-09-27T12:48:13Z"
        generation: 1
        labels:
          machineconfiguration.openshift.io/role: worker
        name: not-pullable-image-tc54054-w75j1k67
        resourceVersion: "374500"
        uid: 7f828fbc-8da3-4f16-89e2-34e39ff830b3
      spec:
        config:
          ignition:
            version: 3.2.0
          storage:
            files: []
          systemd:
            units: []
        osImageURL: quay.io/openshifttest/tc54054fakeimage:latest
      
      
      2. Check the logs in the machine config daemon pod, you can see this message being spammed, saying that the daemon is marking the node with "Degraded" status.
      
      E0927 14:31:22.858546    1697 writer.go:200] Marking Degraded due to: Error checking type of update image: failed to run command podman (6 tries): [timed out waiting for the condition, running podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshifttest/tc54054fakeimage:latest failed: Error: initializing source docker://quay.io/openshifttest/tc54054fakeimage:latest: reading manifest latest in quay.io/openshifttest/tc54054fakeimage: name unknown: repository not found
      E0927 14:34:10.698564    1697 writer.go:200] Marking Degraded due to: Error checking type of update image: failed to run command podman (6 tries): [timed out waiting for the condition, running podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshifttest/tc54054fakeimage:latest failed: Error: initializing source docker://quay.io/openshifttest/tc54054fakeimage:latest: reading manifest latest in quay.io/openshifttest/tc54054fakeimage: name unknown: repository not found
      E0927 14:36:58.557340    1697 writer.go:200] Marking Degraded due to: Error checking type of update image: failed to run command podman (6 tries): [timed out waiting for the condition, running podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshifttest/tc54054fakeimage:latest failed: Error: initializing source docker://quay.io/openshifttest/tc54054fakeimage:latest: reading manifest latest in quay.io/openshifttest/tc54054fakeimage: name unknown: repository not found
      
      
      

      Actual results:

      The node is not marked as degraded as it should. Only after long time, 20 minutes or so, the node becomes degraded.

      Expected results:

      When the podman pull command fails and the machine config daemon sets the node state as "Degraded", the node should actually be marked as "Degraded".

      Additional info:

       

       

       

      Attachments

        Issue Links

          Activity

            People

              jkyros@redhat.com John Kyros
              sregidor@redhat.com Sergio Regidor de la Rosa
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: