Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2197

[upgrade 4.11.z to 4.12 nightly] rpm-ostree update via container failed

XMLWordPrintable

    • Critical
    • None
    • Approved
    • False
    • Hide

      None

      Show
      None
    • NA
    • Rejected

      Description of problem:

      some upgrade ci jobs from 4.11.z to 4.12 nightly build are failed, because system unit machine-config-daemon-update-rpmostree-via-container is failed

      e.g. job https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.12-nightly-4.12-upgrade-from-stable-4.11-aws-ipi-proxy-p1/1579169944476585984

      omg get mcp
      NAME    CONFIG                                            UPDATED  UPDATING  DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT  DEGRADEDMACHINECOUNT  AGE
      worker  rendered-worker-6e18de1272fad7a5ca1529941e3ceaed  False    True      True      3             0                  0                    1                     3h53m
      master  rendered-master-60f4ff5893c94f53acd9ebb7a6bf53d4  False    True      True      3             0                  0                    1                     3h53m 

      check issued node

      omg get node/ip-10-0-57-74.us-east-2.compute.internal -o yaml|yq -y '.metadata.annotations'
      cloud.network.openshift.io/egress-ipconfig: '[{"interface":"eni-0f6de21569b5b65c8","ifaddr":{"ipv4":"10.0.48.0/20"},"capacity":{"ipv4":14,"ipv6":15}}]'
      csi.volume.kubernetes.io/nodeid: '{"ebs.csi.aws.com":"i-01a34f6b5f2cd1e41"}'
      machine.openshift.io/machine: openshift-machine-api/ci-op-kb95kxx9-2a438-r6z94-master-2
      machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
      machineconfiguration.openshift.io/currentConfig: rendered-master-065664319cfbaee64277097d49a8a5a6
      machineconfiguration.openshift.io/desiredConfig: rendered-master-60f4ff5893c94f53acd9ebb7a6bf53d4
      machineconfiguration.openshift.io/desiredDrain: drain-rendered-master-60f4ff5893c94f53acd9ebb7a6bf53d4
      machineconfiguration.openshift.io/lastAppliedDrain: drain-rendered-master-60f4ff5893c94f53acd9ebb7a6bf53d4
      machineconfiguration.openshift.io/reason: 'error running systemd-run --unit machine-config-daemon-update-rpmostree-via-container
        --collect --wait -- podman run --authfile /var/lib/kubelet/config.json --privileged
        --pid=host --net=host --rm -v /:/run/host quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661
        rpm-ostree ex deploy-from-self /run/host: Running as unit: machine-config-daemon-update-rpmostree-via-container.service
      
      
        Finished with result: exit-code
      
      
        Main processes terminated with: code=exited/status=125
      
      
        Service runtime: 2min 52ms
      
      
        CPU time consumed: 144ms
      
      
        : exit status 125'
      machineconfiguration.openshift.io/state: Degraded
      volumes.kubernetes.io/controller-managed-attach-detach: 'true' 

      check mcd log on issued node

      omg get pod -n openshift-machine-config-operator  -o json | jq -r '.items[]|select(.spec.nodeName=="ip-10-0-57-74.us-east-2.compute.internal")|.metadata.name' | grep daemon
      machine-config-daemon-znbvf
      
      2022-10-09T22:12:58.797891917Z I1009 22:12:58.797821  179598 update.go:1917] Updating OS to layered image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661
      2022-10-09T22:12:58.797891917Z I1009 22:12:58.797846  179598 rpm-ostree.go:447] Running captured: rpm-ostree --version
      2022-10-09T22:12:58.815829171Z I1009 22:12:58.815800  179598 update.go:2068] rpm-ostree is not new enough for layering; forcing an update via container
      2022-10-09T22:12:58.817577513Z I1009 22:12:58.817555  179598 update.go:2053] Running: systemd-run --unit machine-config-daemon-update-rpmostree-via-container --collect --wait -- podman run --authfile /var/lib/kubelet/config.json --privileged --pid=host --net=host --rm -v /:/run/host quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661 rpm-ostree ex deploy-from-self /run/host 
      ...
      2022-10-09T22:15:00.831959313Z E1009 22:15:00.831949  179598 writer.go:200] Marking Degraded due to: error running systemd-run --unit machine-config-daemon-update-rpmostree-via-container --collect --wait -- podman run --authfile /var/lib/kubelet/config.json --privileged --pid=host --net=host --rm -v /:/run/host quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661 rpm-ostree ex deploy-from-self /run/host: Running as unit: machine-config-daemon-update-rpmostree-via-container.service
      2022-10-09T22:15:00.831959313Z Finished with result: exit-code
      2022-10-09T22:15:00.831959313Z Main processes terminated with: code=exited/status=125
      2022-10-09T22:15:00.831959313Z Service runtime: 2min 52ms
      2022-10-09T22:15:00.831959313Z CPU time consumed: 144ms
      2022-10-09T22:15:00.831959313Z : exit status 125

      Version-Release number of selected component (if applicable):

      4.12

      Steps to Reproduce:

      upgrade cluster from 4.11.8 to 4.12.0-0.nightly-2022-10-05-053337  

      Actual results:

      upgrade is failed due to node is degraded, rpm-ostree update via container is failed

      Expected results:

      upgrade can be completed successfully

      Additional info:

      must-gather: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.12-nightly-4.12-upgrade-from-stable-4.11-aws-ipi-proxy-p1/1579169944476585984/artifacts/aws-ipi-proxy-p1/gather-must-gather/artifacts/must-gather.tar

      Other build logs of failed jobs

      https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.12-nightly-4.12-upgrade-from-stable-4.11-aws-ipi-proxy-cco-manual-security-token-service-p1/1579200140067999744/build-log.txt

      https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.12-nightly-4.12-upgrade-from-stable-4.11-azure-ipi-proxy-p1/1579094436883730432/build-log.txt

      https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.12-nightly-4.12-upgrade-from-stable-4.11-azure-ipi-proxy-workers-rhcos-rhel8-p2/1578747158293647360/build-log.txt

            walters@redhat.com Colin Walters
            rhn-support-rioliu Rio Liu
            Rio Liu Rio Liu
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: