Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.12
Component/s: Machine Config Operator
Labels:
- trt

Severity:
Important
Regression:
None
Release Blocker:
Proposed
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

We have detected a noticable drop in GCP upgrade success rates (90%->85% over the last week).

periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-gcp-ovn-rt-upgrade seems to be permafailing now.

The pattern appears to be:

: Operator upgrade machine-config expand_less 	0s
{Failed to upgrade machine-config, operator was degraded (RequiredPoolsFailed): Unable to apply 4.12.0-0.ci-2022-10-10-233202: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool worker is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)]  Failed to upgrade machine-config, operator was degraded (RequiredPoolsFailed): Unable to apply 4.12.0-0.ci-2022-10-10-233202: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool worker is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 1)]}

and

disruption_tests: [sig-arch] Check if alerts are firing during or after upgrade success

alert MCDPivotError fired for 1021 seconds with labels: {container="oauth-proxy", endpoint="metrics", err="error running systemd-run --unit machine-config-daemon-update-rpmostree-via-container --collect --wait -- podman run --authfile /var/lib/kubelet/config.json --privileged --pid=host --net=host --rm -v /:/run/host registry.ci.openshift.org/ocp/4.12-2022-10-10-233202@sha256:53afd52bd7920b4a1e9dd805a16e643937b406abe092508c37e7e089fa9a806f rpm-ostree ex deploy-from-self /run/host: Running as unit: machine-config-daemon-update-rpmostree-via-container.service\nFinished with result: exit-code\nMain processes terminated with: code=exited/status=1\nService runtime: 38.597s\nCPU time consumed: 1min 6.453s\n: exit status 1", exported_node="ci-op-rxh51fhz-6bb16-2qspl-worker-a-lswln", instance="10.0.128.3:9001", job="machine-config-daemon", namespace="openshift-machine-config-operator", node="ci-op-rxh51fhz-6bb16-2qspl-worker-a-lswln", pivot_target="registry.ci.openshift.org/ocp/4.12-2022-10-10-233202@sha256:53afd52bd7920b4a1e9dd805a16e643937b406abe092508c37e7e089fa9a806f", pod="machine-config-daemon-bv7bp", service="machine-config-daemon", severity="warning"}

This MCDPivorError is probably very telling. Search.ci shows it happening primarily on GCP rt and for much longer durations than the occasional hits elsewhere.

https://search.ci.openshift.org/?search=alert+MCDPivotError+fired&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=gcp&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

job=periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-gcp-ovn-rt-upgrade

duplicates

OCPBUGS-2269 "error: No enabled repositories" on upgrade with kernelType: realtime enabled

Closed

Assignee:: Team MCO

Reporter:: Devan Goodwin

QA Contact:: Rio Liu

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2022/10/12 12:43 PM

Updated:: 2022/10/13 1:09 PM

Resolved:: 2022/10/12 2:22 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates