-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.15, 4.16, 4.17
Description of problem:
All e2e-ibmcloud-ovn testing is failing due to repeated events of liveness or readiness probes failing during MonitorTests.
Version-Release number of selected component (if applicable):
4.16.0-0.ci.test-2024-02-20-184205-ci-op-lghcpt9x-latest
How reproducible:
Appears to be 100%
Steps to Reproduce:
1. Setup IPI cluster on IBM Cloud 2. Run OCP Conformance w/ MonitorTests (CI does this on IBM Cloud related PR's)
Actual results:
Failed OCP Conformance tests, due to MonitorTests failure: : [sig-arch] events should not repeat pathologically for ns/openshift-cloud-controller-manager expand_less0s{ 2 events happened too frequently event happened 43 times, something is wrong: namespace/openshift-cloud-controller-manager node/ci-op-lghcpt9x-52953-tk4vl-master-2 pod/ibm-cloud-controller-manager-6c5f8594c5-bpnm8 hmsg/d91441a732 - reason/ProbeError Liveness probe error: Get "https://10.241.129.4:10258/healthz": dial tcp 10.241.129.4:10258: connect: connection refused result=reject body: From: 20:25:44Z To: 20:25:45Z event happened 43 times, something is wrong: namespace/openshift-cloud-controller-manager node/ci-op-lghcpt9x-52953-tk4vl-master-1 pod/ibm-cloud-controller-manager-6c5f8594c5-wn4fq hmsg/fda26f2bbf - reason/ProbeError Liveness probe error: Get "https://10.241.64.6:10258/healthz": dial tcp 10.241.64.6:10258: connect: connection refused result=reject body: From: 20:25:54Z To: 20:25:55Z} : [sig-arch] events should not repeat pathologically for ns/openshift-oauth-apiserver expand_less0s{ 1 events happened too frequently event happened 25 times, something is wrong: namespace/openshift-oauth-apiserver node/ci-op-lghcpt9x-52953-tk4vl-master-1 pod/apiserver-c5ff4776b-kqg7c hmsg/c9e932e38d - reason/ProbeError Readiness probe error: HTTP probe failed with statuscode: 500 result=reject body: [+]ping ok [+]log ok [+]etcd ok [-]etcd-readiness failed: reason withheld [+]informer-sync ok [+]poststarthook/generic-apiserver-start-informers ok [+]poststarthook/priority-and-fairness-config-consumer ok [+]poststarthook/priority-and-fairness-filter ok [+]poststarthook/storage-object-count-tracker-hook ok [+]poststarthook/openshift.io-StartUserInformer ok [+]poststarthook/openshift.io-StartOAuthInformer ok [+]poststarthook/openshift.io-StartTokenTimeoutUpdater ok [+]shutdown ok readyz check failed From: 20:25:04Z To: 20:25:05Z}
Expected results:
Passing OCP Conformance (w/ MonitorTests) test
Additional info:
The frequent (perhaps only) failures appear to occur via: [sig-arch] events should not repeat pathologically for ns/openshift-cloud-controller-manager [sig-arch] events should not repeat pathologically for ns/openshift-oauth-apiserver I am unsure on the cause of the liveness/readiness probe failures as of yet, unsure if the underlying Infrastructure is the cause (and if so, what resource).