Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Minor
Fix Version/s: None
Affects Version/s: 4.12
Component/s: Machine Config Operator
Labels:
- mco_qe_os_layering

Test Coverage:

+
Severity:
Moderate
Regression:
None
Sprint:
Sprint 226, Sprint 227, Sprint 228
sprint_count:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.13.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

When we configure a MC using an osImage that cannot be pulled, the machine config daemon pod spams logs saying that the node is set to "Degraded" state, but the node is not set to "Degraded" state.

Only after long time, like 20 minutes or half and hour, the node eventually becomes degraded.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2022-09-26-111919

How reproducible:

Always

Steps to Reproduce:

1. Create a MC using an osImage that cannot be pulled

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  creationTimestamp: "2022-09-27T12:48:13Z"
  generation: 1
  labels:
    machineconfiguration.openshift.io/role: worker
  name: not-pullable-image-tc54054-w75j1k67
  resourceVersion: "374500"
  uid: 7f828fbc-8da3-4f16-89e2-34e39ff830b3
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files: []
    systemd:
      units: []
  osImageURL: quay.io/openshifttest/tc54054fakeimage:latest


2. Check the logs in the machine config daemon pod, you can see this message being spammed, saying that the daemon is marking the node with "Degraded" status.

E0927 14:31:22.858546    1697 writer.go:200] Marking Degraded due to: Error checking type of update image: failed to run command podman (6 tries): [timed out waiting for the condition, running podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshifttest/tc54054fakeimage:latest failed: Error: initializing source docker://quay.io/openshifttest/tc54054fakeimage:latest: reading manifest latest in quay.io/openshifttest/tc54054fakeimage: name unknown: repository not found
E0927 14:34:10.698564    1697 writer.go:200] Marking Degraded due to: Error checking type of update image: failed to run command podman (6 tries): [timed out waiting for the condition, running podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshifttest/tc54054fakeimage:latest failed: Error: initializing source docker://quay.io/openshifttest/tc54054fakeimage:latest: reading manifest latest in quay.io/openshifttest/tc54054fakeimage: name unknown: repository not found
E0927 14:36:58.557340    1697 writer.go:200] Marking Degraded due to: Error checking type of update image: failed to run command podman (6 tries): [timed out waiting for the condition, running podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshifttest/tc54054fakeimage:latest failed: Error: initializing source docker://quay.io/openshifttest/tc54054fakeimage:latest: reading manifest latest in quay.io/openshifttest/tc54054fakeimage: name unknown: repository not found

Actual results:

The node is not marked as degraded as it should. Only after long time, 20 minutes or so, the node becomes degraded.

Expected results:

When the podman pull command fails and the machine config daemon sets the node state as "Degraded", the node should actually be marked as "Degraded".

Additional info:

blocks

OCPBUGS-3001 osImages that cannot be pulled do not set the node as Degraded properly

Closed

is cloned by

OCPBUGS-3001 osImages that cannot be pulled do not set the node as Degraded properly

Closed

is depended on by

OCPBUGS-14071 Pools are not degraded when we configure an OSimage that cannot be pulled

Closed

links to

[release-4.12] OCPBUGS-3001: Substitute skopeo inspect for imageInspect/podman, drop podman inspect fallback

openshift/machine-config-operator#3390: OCPBUGS-1761: Substitude skopeo inspect for imageInspect/podman, drop podman inspect fallback

openshift/machine-config-operator#3413: OCPBUGS-1761: Imageinspect takes type of error into account, drop podman inspect fallback

(1 links to)

Assignee:: John Kyros

Reporter:: Sergio Regidor de la Rosa

QA Contact:: Sergio Regidor de la Rosa

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2022/09/27 2:48 PM

Updated:: 2023/05/26 9:29 PM

Resolved:: 2023/05/17 10:36 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates