Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.20
Component/s: Test Framework
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Critical
Regression:
None

Target Backport Versions:
None
Target Version:

4.20.0
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

TLDR: seems to be https://issues.redhat.com/browse/OCPBUGS-13020 again.

MCO blocking pre-submit job e2e-gcp-op is timing out on 4.20. Looking through some example failures, we see a drastic increase in reboot time.


Looking at older runs, the max amount of time between MCD restarts are about 1.5 minutes.

2025-06-12T15:27:40.782632915+00:00 stderr F I0612 15:27:40.762518    2602 daemon.go:1420] Shutting down MachineConfigDaemon
...
...
2025-06-12T15:29:00.179980191+00:00 stderr F I0612 15:29:00.179911    2607 start.go:68] Version: machine-config-daemon-4.6.0-202006240615.p0-3497-gc4676685-dirty (c467668557d7ff4f323e3dfc119de326ed6237db)


On the newer runs that timeout, the max amount of time is about 4 minutes:
2025-06-25T14:20:14.210451103+00:00 stderr F I0625 14:20:14.210429    2595 daemon.go:1420] Shutting down MachineConfigDaemon
....
....
2025-06-25T14:24:00.376789443+00:00 stderr F I0625 14:24:00.376725    2620 start.go:68] Version: machine-config-daemon-4.6.0-202006240615.p0-3513-g2499d899-dirty (2499d8996a2ba9f24b55a927a116d68e6ec9680a)
2025-06-25T14:24:00.384100575+00:00 stderr F I0625 14:24:00.384004    2620 update.go:2594] Running: mount --rbind /run/secrets /rootfs/run/secretsSo we're losing 3 minutes somewhere after the MCD shuts down. When we look at the journals on this node at this window:
Jun 25 14:20:14.087396 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-infra-3d9079bedd38a818d4f3247b72e5b14d.
Jun 25 14:20:14.092601 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq root[6130]: machine-config-daemon[2595]: "reboot successful"
Jun 25 14:20:14.104645 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq systemd-logind[952]: The system will reboot now!
Jun 25 14:20:14.108018 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq systemd-logind[952]: System is rebooting.
...
...
Jun 25 14:20:34.173082 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq systemd[1]: crio-conmon-b9bb524471736b4337196b2121d482dbe140c1a8bf388460ddefd9e4b2fe9bca.scope: Deactivated successfully.
=======[BLIP OF ~3 MINS here]========================
Jun 25 14:23:14.280408 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq systemd[1]: crio-2de1d0b8f3eeb7a4fb3b0ea6149320fb970366b653057d3f0c426cceedda9cd8.scope: Stopping timed out. Killing.
Jun 25 14:23:14.281032 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq systemd[1]: crio-2de1d0b8f3eeb7a4fb3b0ea6149320fb970366b653057d3f0c426cceedda9cd8.scope: Killing process 4126 (promtail) with signal SIGKILL.
Jun 25 14:23:14.281273 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq systemd[1]: crio-2de1d0b8f3eeb7a4fb3b0ea6149320fb970366b653057d3f0c426cceedda9cd8.scope: Killing process 4132 (promtail) with signal SIGKILL.
Jun 25 14:23:14.281305 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq systemd[1]: crio-2de1d0b8f3eeb7a4fb3b0ea6149320fb970366b653057d3f0c426cceedda9cd8.scope: Killing process 4152 (n/a) with signal SIGKILL.
Jun 25 14:23:14.281326 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq systemd[1]: crio-2de1d0b8f3eeb7a4fb3b0ea6149320fb970366b653057d3f0c426cceedda9cd8.scope: Killing process 4153 (n/a) with signal SIGKILL.
Jun 25 14:23:14.281352 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq systemd[1]: crio-2de1d0b8f3eeb7a4fb3b0ea6149320fb970366b653057d3f0c426cceedda9cd8.scope: Killing process 4163 (promtail) with signal SIGKILL.
Jun 25 14:23:14.281366 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq systemd[1]: crio-2de1d0b8f3eeb7a4fb3b0ea6149320fb970366b653057d3f0c426cceedda9cd8.scope: Killing process 4164 (n/a) with signal SIGKILL.
Jun 25 14:23:14.281377 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq systemd[1]: crio-2de1d0b8f3eeb7a4fb3b0ea6149320fb970366b653057d3f0c426cceedda9cd8.scope: Killing process 4255 (n/a) with signal SIGKILL.
Jun 25 14:23:14.289820 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq systemd[1]: crio-2de1d0b8f3eeb7a4fb3b0ea6149320fb970366b653057d3f0c426cceedda9cd8.scope: Failed with result 'timeout'.
Jun 25 14:23:14.290292 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq systemd[1]: Stopped libcrun container.


The referenced scope is loki-promtail:

Jun 25 14:18:56.330012 ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq crio[2425]: time="2025-06-25T14:18:56.329927842Z" level=info msg="Created container 2de1d0b8f3eeb7a4fb3b0ea6149320fb970366b653057d3f0c426cceedda9cd8: openshift-e2e-loki/loki-promtail-qppd5/promtail" id=914eebbb-c8e5-41ed-908e-2da5731e9712 name=/runtime.v1.RuntimeService/CreateContainer

Example run journal logs:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_release/66378/rehearse-66378-pull-ci-openshift-machine-config-operator-main-e2e-gcp-op/1937821703648317440/artifacts/e2e-gcp-op/gather-extra/artifacts/nodes/ci-op-gwhnctd5-27524-j9qk8-worker-c-k6jsq/

Version-Release number of selected component (if applicable):

    4.20

How reproducible:

    Almost always (but some reboots seem to get lucky)

Steps to Reproduce:

    1. Check CI logs
    2.
    3.

Actual results:

    loki-promtail fails

Expected results:

    loki-promtail should either be shutdown cleanly or not be running

Additional info:

links to

openshift/release#66518: OCPBUGS-58178: Bump promtail version to 3.4.3

Assignee:: Devan Goodwin

Reporter:: Yu Qi Zhang

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2025/06/26 7:55 PM

Updated:: 2025/08/14 1:12 AM

Resolved:: 2025/08/14 1:12 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates