-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
4.16
-
Important
-
No
-
MCO Sprint 254
-
1
-
Rejected
-
False
-
-
Description of problem:
Observing an intermittent issue with SNO install (with Assisted Installer) with DU profile where the cluster ends up in degraded mcp state post install. Reproduced with both 4.16.0-rc.0 and 4.16.0-rc.1 $ oc get mcp master NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master False True True 1 0 0 1 15h mcp master status: Conditions: Last Transition Time: 2024-05-14T21:21:30Z Message: Reason: Status: False Type: Updated Last Transition Time: 2024-05-14T21:21:30Z Message: All nodes are updating to MachineConfig rendered-master-3ebeee8538946014a3f107ea0603d260 Reason: Status: True Type: Updating Last Transition Time: 2024-05-14T21:21:30Z Message: Node e32-h22-r750 is reporting: "missing MachineConfig rendered-master-49651500230308839606552505f7f484\nmachineconfig.machineconfiguration.openshift.io \"rendered-master-49651500230308839606552505f7f484\" not found" Reason: 1 nodes are reporting degraded status on sync Status: True Type: NodeDegraded Last Transition Time: 2024-05-14T21:21:30Z Message: Reason: Status: True Type: Degraded Last Transition Time: 2024-05-14T21:21:35Z Message: Reason: Status: False Type: RenderDegraded Configuration: Degraded Machine Count: 1 Machine Count: 1 Observed Generation: 2 Ready Machine Count: 0 Unavailable Machine Count: 1 Updated Machine Count: 0 Events: <none> machine-config-daemon pod logs: [2024-05-14T21:38:13Z INFO nmstatectl] Nmstate version: 2.2.27 [2024-05-14T21:38:13Z INFO nmstatectl::persist_nic] /etc/systemd/network does not exist, no need to clean up I0514 21:38:13.197788 50688 daemon.go:1624] In bootstrap mode E0514 21:38:13.197828 50688 writer.go:226] Marking Degraded due to: missing MachineConfig rendered-master-49651500230308839606552505f7f484 machineconfig.machineconfiguration.openshift.io "rendered-master-49651500230308839606552505f7f484" not found I0514 21:38:42.173501 50688 certificate_writer.go:340] Certificate was synced from controllerconfig resourceVersion 12044 I0514 21:38:45.205661 50688 daemon.go:1898] Running: /run/machine-config-daemon-bin/nmstatectl persist-nic-names --root / --kargs-out /tmp/nmstate-kargs1344634730 --cleanup machine-config-daemon pod events: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 28m default-scheduler Successfully assigned openshift-machine-config-operator/machine-config-daemon-29r45 to e32-h22-r750 Normal Pulled 28m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:714a42e9eb52ef1bae8a2575ca1a2bfdf733d5a6786f08ceb3b6ff61d59931cf" already present on machine Normal Created 28m kubelet Created container machine-config-daemon Normal Started 28m kubelet Started container machine-config-daemon Normal Pulled 28m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:91bb4f8991ea4b597c9404cec89a984cc3ad3f76a6099d868bc3388dbbd36346" already present on machine Normal Created 28m kubelet Created container kube-rbac-proxy Normal Started 28m kubelet Started container kube-rbac-proxy Normal Created 26m kubelet Created container machine-config-daemon Normal Pulled 26m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:714a42e9eb52ef1bae8a2575ca1a2bfdf733d5a6786f08ceb3b6ff61d59931cf" already present on machine Normal Started 26m kubelet Started container machine-config-daemon Normal Killing 19m (x2 over 22m) kubelet Container machine-config-daemon failed liveness probe, will be restarted Normal Created 19m (x2 over 22m) kubelet Created container machine-config-daemon Normal Started 19m (x2 over 22m) kubelet Started container machine-config-daemon Warning Unhealthy 16m (x9 over 23m) kubelet Liveness probe failed: Get "http://127.0.0.1:8798/health": dial tcp 127.0.0.1:8798: connect: connection refused Normal Pulled 13m (x4 over 22m) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:714a42e9eb52ef1bae8a2575ca1a2bfdf733d5a6786f08ceb3b6ff61d59931cf" already present on machine Warning ProbeError 8m5s (x17 over 23m) kubelet Liveness probe error: Get "http://127.0.0.1:8798/health": dial tcp 127.0.0.1:8798: connect: connection refused body: Warning BackOff 3m28s (x7 over 4m35s) kubelet Back-off restarting failed container machine-config-daemon in pod machine-config-daemon-29r45_openshift-machine-config-operator(9953f60a-c482-4ec5-9f3c-d6ac5a874791) oc describe node has the following annotation: machineconfiguration.openshift.io/reason: missing MachineConfig rendered-master-49651500230308839606552505f7f484 machineconfig.machineconfiguration.openshift.io "rendered-master-49651500230308839606552505f7f484" not found
Version-Release number of selected component (if applicable):
OCP 4.16.0-rc.0, 4.16.0-rc.1
How reproducible:
1. Install SNO with DU profile 2. Check mcp status after install
Steps to Reproduce:
1. 2. 3.
Actual results:
Master mcp is degraded post install
Expected results:
Master mcp should not be degraded post install
Additional info:
- relates to
-
OCPBUGS-33229 OCP 4.16 install fails with MCO error "error during syncRequiredMachineConfigPools"
- Closed