Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3001

osImages that cannot be pulled do not set the node as Degraded properly

XMLWordPrintable

    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-1761. The following is the description of the original issue:

      Description of problem:

      When we configure a MC using an osImage that cannot be pulled, the machine config daemon pod spams logs saying that the node is set to "Degraded" state, but the node is not set to "Degraded" state.
      
      Only after long time, like 20 minutes or half and hour, the node eventually becomes degraded.

      Version-Release number of selected component (if applicable):

      4.12.0-0.nightly-2022-09-26-111919

      How reproducible:

      Always

      Steps to Reproduce:

      1. Create a MC using an osImage that cannot be pulled
      
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        creationTimestamp: "2022-09-27T12:48:13Z"
        generation: 1
        labels:
          machineconfiguration.openshift.io/role: worker
        name: not-pullable-image-tc54054-w75j1k67
        resourceVersion: "374500"
        uid: 7f828fbc-8da3-4f16-89e2-34e39ff830b3
      spec:
        config:
          ignition:
            version: 3.2.0
          storage:
            files: []
          systemd:
            units: []
        osImageURL: quay.io/openshifttest/tc54054fakeimage:latest
      
      
      2. Check the logs in the machine config daemon pod, you can see this message being spammed, saying that the daemon is marking the node with "Degraded" status.
      
      E0927 14:31:22.858546    1697 writer.go:200] Marking Degraded due to: Error checking type of update image: failed to run command podman (6 tries): [timed out waiting for the condition, running podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshifttest/tc54054fakeimage:latest failed: Error: initializing source docker://quay.io/openshifttest/tc54054fakeimage:latest: reading manifest latest in quay.io/openshifttest/tc54054fakeimage: name unknown: repository not found
      E0927 14:34:10.698564    1697 writer.go:200] Marking Degraded due to: Error checking type of update image: failed to run command podman (6 tries): [timed out waiting for the condition, running podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshifttest/tc54054fakeimage:latest failed: Error: initializing source docker://quay.io/openshifttest/tc54054fakeimage:latest: reading manifest latest in quay.io/openshifttest/tc54054fakeimage: name unknown: repository not found
      E0927 14:36:58.557340    1697 writer.go:200] Marking Degraded due to: Error checking type of update image: failed to run command podman (6 tries): [timed out waiting for the condition, running podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshifttest/tc54054fakeimage:latest failed: Error: initializing source docker://quay.io/openshifttest/tc54054fakeimage:latest: reading manifest latest in quay.io/openshifttest/tc54054fakeimage: name unknown: repository not found
      
      
      

      Actual results:

      The node is not marked as degraded as it should. Only after long time, 20 minutes or so, the node becomes degraded.

      Expected results:

      When the podman pull command fails and the machine config daemon sets the node state as "Degraded", the node should actually be marked as "Degraded".

      Additional info:

       

       

            jkyros@redhat.com John Kyros
            openshift-crt-jira-prow OpenShift Prow Bot
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: