Resolution: Duplicate
We have detected a noticable drop in GCP upgrade success rates (90%->85% over the last week).
periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-gcp-ovn-rt-upgrade seems to be permafailing now.
The pattern appears to be:
: Operator upgrade machine-config expand_less 0s {Failed to upgrade machine-config, operator was degraded (RequiredPoolsFailed): Unable to apply 4.12.0-0.ci-2022-10-10-233202: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool worker is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)] Failed to upgrade machine-config, operator was degraded (RequiredPoolsFailed): Unable to apply 4.12.0-0.ci-2022-10-10-233202: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool worker is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)]}
disruption_tests: [sig-arch] Check if alerts are firing during or after upgrade success alert MCDPivotError fired for 1021 seconds with labels: {container="oauth-proxy", endpoint="metrics", err="error running systemd-run --unit machine-config-daemon-update-rpmostree-via-container --collect --wait -- podman run --authfile /var/lib/kubelet/config.json --privileged --pid=host --net=host --rm -v /:/run/host registry.ci.openshift.org/ocp/4.12-2022-10-10-233202@sha256:53afd52bd7920b4a1e9dd805a16e643937b406abe092508c37e7e089fa9a806f rpm-ostree ex deploy-from-self /run/host: Running as unit: machine-config-daemon-update-rpmostree-via-container.service\nFinished with result: exit-code\nMain processes terminated with: code=exited/status=1\nService runtime: 38.597s\nCPU time consumed: 1min 6.453s\n: exit status 1", exported_node="ci-op-rxh51fhz-6bb16-2qspl-worker-a-lswln", instance="", job="machine-config-daemon", namespace="openshift-machine-config-operator", node="ci-op-rxh51fhz-6bb16-2qspl-worker-a-lswln", pivot_target="registry.ci.openshift.org/ocp/4.12-2022-10-10-233202@sha256:53afd52bd7920b4a1e9dd805a16e643937b406abe092508c37e7e089fa9a806f", pod="machine-config-daemon-bv7bp", service="machine-config-daemon", severity="warning"}
This MCDPivorError is probably very telling. Search.ci shows it happening primarily on GCP rt and for much longer durations than the occasional hits elsewhere.
- duplicates
OCPBUGS-2269 "error: No enabled repositories" on upgrade with kernelType: realtime enabled
- Closed