-
Bug
-
Resolution: Done
-
Minor
-
None
-
4.12
-
+
-
Moderate
-
None
-
Sprint 226, Sprint 227, Sprint 228
-
3
-
False
-
Description of problem:
When we configure a MC using an osImage that cannot be pulled, the machine config daemon pod spams logs saying that the node is set to "Degraded" state, but the node is not set to "Degraded" state. Only after long time, like 20 minutes or half and hour, the node eventually becomes degraded.
Version-Release number of selected component (if applicable):
4.12.0-0.nightly-2022-09-26-111919
How reproducible:
Always
Steps to Reproduce:
1. Create a MC using an osImage that cannot be pulled apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: creationTimestamp: "2022-09-27T12:48:13Z" generation: 1 labels: machineconfiguration.openshift.io/role: worker name: not-pullable-image-tc54054-w75j1k67 resourceVersion: "374500" uid: 7f828fbc-8da3-4f16-89e2-34e39ff830b3 spec: config: ignition: version: 3.2.0 storage: files: [] systemd: units: [] osImageURL: quay.io/openshifttest/tc54054fakeimage:latest 2. Check the logs in the machine config daemon pod, you can see this message being spammed, saying that the daemon is marking the node with "Degraded" status. E0927 14:31:22.858546 1697 writer.go:200] Marking Degraded due to: Error checking type of update image: failed to run command podman (6 tries): [timed out waiting for the condition, running podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshifttest/tc54054fakeimage:latest failed: Error: initializing source docker://quay.io/openshifttest/tc54054fakeimage:latest: reading manifest latest in quay.io/openshifttest/tc54054fakeimage: name unknown: repository not found E0927 14:34:10.698564 1697 writer.go:200] Marking Degraded due to: Error checking type of update image: failed to run command podman (6 tries): [timed out waiting for the condition, running podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshifttest/tc54054fakeimage:latest failed: Error: initializing source docker://quay.io/openshifttest/tc54054fakeimage:latest: reading manifest latest in quay.io/openshifttest/tc54054fakeimage: name unknown: repository not found E0927 14:36:58.557340 1697 writer.go:200] Marking Degraded due to: Error checking type of update image: failed to run command podman (6 tries): [timed out waiting for the condition, running podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshifttest/tc54054fakeimage:latest failed: Error: initializing source docker://quay.io/openshifttest/tc54054fakeimage:latest: reading manifest latest in quay.io/openshifttest/tc54054fakeimage: name unknown: repository not found
Actual results:
The node is not marked as degraded as it should. Only after long time, 20 minutes or so, the node becomes degraded.
Expected results:
When the podman pull command fails and the machine config daemon sets the node state as "Degraded", the node should actually be marked as "Degraded".
Additional info:
- blocks
-
OCPBUGS-3001 osImages that cannot be pulled do not set the node as Degraded properly
- Closed
- is cloned by
-
OCPBUGS-3001 osImages that cannot be pulled do not set the node as Degraded properly
- Closed
- is depended on by
-
OCPBUGS-14071 Pools are not degraded when we configure an OSimage that cannot be pulled
- Closed
- links to
(1 links to)