Loading...

XML

Word

Printable

Type: Bug
Resolution: Obsolete
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.15, 4.16, 4.17
Component/s: Test Framework
Labels:
- ibmcloud

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
No

Target Backport Versions:

4.15.z, 4.16.z
Target Version:

4.17.z
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    All e2e-ibmcloud-ovn testing is failing due to repeated events of liveness or readiness probes failing during MonitorTests.

Version-Release number of selected component (if applicable):

    4.16.0-0.ci.test-2024-02-20-184205-ci-op-lghcpt9x-latest

How reproducible:

    Appears to be 100%

Steps to Reproduce:

    1. Setup IPI cluster on IBM Cloud
    2. Run OCP Conformance w/ MonitorTests (CI does this on IBM Cloud related PR's)

Actual results:

    Failed OCP Conformance tests, due to MonitorTests failure:

: [sig-arch] events should not repeat pathologically for ns/openshift-cloud-controller-manager expand_less0s{  2 events happened too frequently

event happened 43 times, something is wrong: namespace/openshift-cloud-controller-manager node/ci-op-lghcpt9x-52953-tk4vl-master-2 pod/ibm-cloud-controller-manager-6c5f8594c5-bpnm8 hmsg/d91441a732 - reason/ProbeError Liveness probe error: Get "https://10.241.129.4:10258/healthz": dial tcp 10.241.129.4:10258: connect: connection refused result=reject 
body: 
 From: 20:25:44Z To: 20:25:45Z
event happened 43 times, something is wrong: namespace/openshift-cloud-controller-manager node/ci-op-lghcpt9x-52953-tk4vl-master-1 pod/ibm-cloud-controller-manager-6c5f8594c5-wn4fq hmsg/fda26f2bbf - reason/ProbeError Liveness probe error: Get "https://10.241.64.6:10258/healthz": dial tcp 10.241.64.6:10258: connect: connection refused result=reject 
body: 
 From: 20:25:54Z To: 20:25:55Z}


: [sig-arch] events should not repeat pathologically for ns/openshift-oauth-apiserver expand_less0s{  1 events happened too frequently

event happened 25 times, something is wrong: namespace/openshift-oauth-apiserver node/ci-op-lghcpt9x-52953-tk4vl-master-1 pod/apiserver-c5ff4776b-kqg7c hmsg/c9e932e38d - reason/ProbeError Readiness probe error: HTTP probe failed with statuscode: 500 result=reject 
body: [+]ping ok
[+]log ok
[+]etcd ok
[-]etcd-readiness failed: reason withheld
[+]informer-sync ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/openshift.io-StartUserInformer ok
[+]poststarthook/openshift.io-StartOAuthInformer ok
[+]poststarthook/openshift.io-StartTokenTimeoutUpdater ok
[+]shutdown ok
readyz check failed

 From: 20:25:04Z To: 20:25:05Z}

Expected results:

    Passing OCP Conformance (w/ MonitorTests) test

Additional info:

    The frequent (perhaps only) failures appear to occur via:

[sig-arch] events should not repeat pathologically for ns/openshift-cloud-controller-manager

[sig-arch] events should not repeat pathologically for ns/openshift-oauth-apiserver

I am unsure on the cause of the liveness/readiness probe failures as of yet, unsure if the underlying Infrastructure is the cause (and if so, what resource).

links to

openshift/origin#28667: OCPBUGS-30267: Clarify a misleading message in patho event failures

Assignee:: Devan Goodwin

Reporter:: Christopher Schaefer (Inactive)

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/03/05 4:50 PM

Updated:: 2026/01/26 7:43 PM

Resolved:: 2026/01/26 7:43 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates