Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.18, 4.19
Component/s: Installer / Single Node OpenShift
Labels:
- edge-payload
- triaged

Severity:
Moderate
Regression:
No
Story Points:
1
Sprint:
OCPEDGE Sprint 263, OCPEDGE Sprint 264, OCPEDGE Sprint 266
sprint_count:
3
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.19
Target Backport Versions:

4.18

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

We're seeing another issue related to the DNS restart that happens during the SNO reboot.

[sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers

This payload failed because these two jobs hit this excessive restart error. 1 2

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

event [namespace/openshift-catalogd node/ip-10-0-105-161.us-west-1.compute.internal pod/catalogd-controller-manager-59d9788859-n5sf7 hmsg/e0d473c239 - Back-off restarting failed container manager in pod catalogd-controller-manager-59d9788859-n5sf7_openshift-catalogd(78fff808-fe29-48eb-a031-0ba3650a3d84)] happened 89 times
event [namespace/openshift-operator-controller node/ip-10-0-105-161.us-west-1.compute.internal pod/operator-controller-controller-manager-7b46748475-v9d7m hmsg/96a44e679e - Back-off restarting failed container manager in pod operator-controller-controller-manager-7b46748475-v9d7m_openshift-operator-controller(242e6d7e-cedc-42c2-9606-94848608f244)] happened 88 times}

Expected results:

Test passes

Additional info:

Unpacking the logs in loki shows a spike of errors during the the DNS outage caused by rolling out the new DNS pod during the upgrade. This is the case for both the catalogd-controller-manager-59d9788859-n5sf7_openshift-catalogd pod in ns/openshift-catalogd and operator-controller-controller-manager-7b46748475-v9d7m_openshift-operator-controller in ns/openshift-operator-controller

is related to

OCPBUGS-42777 SNO Regression for [sig-network-edge] Verify DNS availability during and after upgrade success

ASSIGNED

links to

openshift/origin#29347: OCPBUGS-45071: fix: adding an exclude list for pathological events occurring on SNO

Assignee:: Egli Hila

Reporter:: Jeremy Poulin

QA Contact:: Neil Hamza

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/11/26 4:12 PM

Updated:: 2025/01/29 1:03 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates