-
Bug
-
Resolution: Done
-
Critical
-
4.12
-
Critical
-
None
-
Approved
-
False
-
-
NA
-
Rejected
Description of problem:
some upgrade ci jobs from 4.11.z to 4.12 nightly build are failed, because system unit machine-config-daemon-update-rpmostree-via-container is failed
omg get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-6e18de1272fad7a5ca1529941e3ceaed False True True 3 0 0 1 3h53m master rendered-master-60f4ff5893c94f53acd9ebb7a6bf53d4 False True True 3 0 0 1 3h53m
check issued node
omg get node/ip-10-0-57-74.us-east-2.compute.internal -o yaml|yq -y '.metadata.annotations' cloud.network.openshift.io/egress-ipconfig: '[{"interface":"eni-0f6de21569b5b65c8","ifaddr":{"ipv4":"10.0.48.0/20"},"capacity":{"ipv4":14,"ipv6":15}}]' csi.volume.kubernetes.io/nodeid: '{"ebs.csi.aws.com":"i-01a34f6b5f2cd1e41"}' machine.openshift.io/machine: openshift-machine-api/ci-op-kb95kxx9-2a438-r6z94-master-2 machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable machineconfiguration.openshift.io/currentConfig: rendered-master-065664319cfbaee64277097d49a8a5a6 machineconfiguration.openshift.io/desiredConfig: rendered-master-60f4ff5893c94f53acd9ebb7a6bf53d4 machineconfiguration.openshift.io/desiredDrain: drain-rendered-master-60f4ff5893c94f53acd9ebb7a6bf53d4 machineconfiguration.openshift.io/lastAppliedDrain: drain-rendered-master-60f4ff5893c94f53acd9ebb7a6bf53d4 machineconfiguration.openshift.io/reason: 'error running systemd-run --unit machine-config-daemon-update-rpmostree-via-container --collect --wait -- podman run --authfile /var/lib/kubelet/config.json --privileged --pid=host --net=host --rm -v /:/run/host quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661 rpm-ostree ex deploy-from-self /run/host: Running as unit: machine-config-daemon-update-rpmostree-via-container.service Finished with result: exit-code Main processes terminated with: code=exited/status=125 Service runtime: 2min 52ms CPU time consumed: 144ms : exit status 125' machineconfiguration.openshift.io/state: Degraded volumes.kubernetes.io/controller-managed-attach-detach: 'true'
check mcd log on issued node
omg get pod -n openshift-machine-config-operator -o json | jq -r '.items[]|select(.spec.nodeName=="ip-10-0-57-74.us-east-2.compute.internal")|.metadata.name' | grep daemon machine-config-daemon-znbvf 2022-10-09T22:12:58.797891917Z I1009 22:12:58.797821 179598 update.go:1917] Updating OS to layered image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661 2022-10-09T22:12:58.797891917Z I1009 22:12:58.797846 179598 rpm-ostree.go:447] Running captured: rpm-ostree --version 2022-10-09T22:12:58.815829171Z I1009 22:12:58.815800 179598 update.go:2068] rpm-ostree is not new enough for layering; forcing an update via container 2022-10-09T22:12:58.817577513Z I1009 22:12:58.817555 179598 update.go:2053] Running: systemd-run --unit machine-config-daemon-update-rpmostree-via-container --collect --wait -- podman run --authfile /var/lib/kubelet/config.json --privileged --pid=host --net=host --rm -v /:/run/host quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661 rpm-ostree ex deploy-from-self /run/host ... 2022-10-09T22:15:00.831959313Z E1009 22:15:00.831949 179598 writer.go:200] Marking Degraded due to: error running systemd-run --unit machine-config-daemon-update-rpmostree-via-container --collect --wait -- podman run --authfile /var/lib/kubelet/config.json --privileged --pid=host --net=host --rm -v /:/run/host quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661 rpm-ostree ex deploy-from-self /run/host: Running as unit: machine-config-daemon-update-rpmostree-via-container.service 2022-10-09T22:15:00.831959313Z Finished with result: exit-code 2022-10-09T22:15:00.831959313Z Main processes terminated with: code=exited/status=125 2022-10-09T22:15:00.831959313Z Service runtime: 2min 52ms 2022-10-09T22:15:00.831959313Z CPU time consumed: 144ms 2022-10-09T22:15:00.831959313Z : exit status 125
Version-Release number of selected component (if applicable):
4.12
Steps to Reproduce:
upgrade cluster from 4.11.8 to 4.12.0-0.nightly-2022-10-05-053337
Actual results:
upgrade is failed due to node is degraded, rpm-ostree update via container is failed
Expected results:
upgrade can be completed successfully
Additional info:
must-gather: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.12-nightly-4.12-upgrade-from-stable-4.11-aws-ipi-proxy-p1/1579169944476585984/artifacts/aws-ipi-proxy-p1/gather-must-gather/artifacts/must-gather.tar
Other build logs of failed jobs
- is related to
-
OCPBUGS-2122 machine-config-daemon failed to update the OS for cluster running behind proxy
- Closed
- links to