Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.15, 4.16
Component/s: Bare Metal Hardware Provisioning
Labels:
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No

Target Backport Versions:
None
Target Version:

4.16.z
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

A frequent cause of test run failure in the e2e-metal-ipi-ovn-ipv6 job is

[sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers

(eg, 1, 2, 3)

The specific "excessive" error is something like:

event [namespace/openshift-e2e-loki node/master-1.ostest.test.metalkube.org pod/loki-promtail-s2ggf hmsg/12a03e9173 - Back-off restarting failed container prod-bearer-token in pod loki-promtail-s2ggf_openshift-e2e-loki(5cfc8a21-bc04-4c34-a68e-7c60e04834ea)] happened 408 times

Poking around in the must-gather reveals that the prod-bearer-token container is exiting because:

level=info name=token-refresher ts=2024-02-06T13:27:36.138331121Z caller=main.go:169 msg=token-refresher
2024/02/06 13:27:36 OIDC provider initialization failed: Get "https://sso.redhat.com/auth/realms/redhat-external/.well-known/openid-configuration": proxyconnect tcp: dial tcp [fd00:1101::1]:8213: connect: connection refused

(Aside: sso.redhat.com has an IPv6 address, so theoretically we could no_proxy it?)

But since this works fine in other runs, it seems like the problem must be that sometimes squid crashes at some point? (Although I'd expect that to cause more failures than this, so maybe not?)

Unfortunately, I can't find any information about the state of the squid proxy in the e2e artifacts; squid is run by hand via podman, so its output isn't captured by must-gather, and it doesn't seem to log anything to the journal either. (Aside: that that script does "ssh root@${IP}{}" but then prefixes every command with "sudo"...)

So that's as far as I got with debugging this...

clones

OCPBUGS-29478 squid proxy sometimes crashing/unreachable/? in e2e-metal-ipi-ovn-ipv6 jobs

Closed

Assignee:: Himanshu Roy

Reporter:: Dan Winship

QA Contact:: Jad Haj Yahya

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/03/01 11:34 AM

Updated:: 2025/07/23 11:44 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates