-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.15.0
-
Critical
-
No
-
Proposed
-
False
-
-
Release Note Not Required
-
In Progress
This is a clone of issue OCPBUGS-24537. The following is the description of the original issue:
—
Description of problem:
4.15 nightly payloads have been affected by this test multiple times: : [sig-arch] events should not repeat pathologically for ns/openshift-kube-scheduler expand_less0s{ 1 events happened too frequently event happened 21 times, something is wrong: namespace/openshift-kube-scheduler node/ci-op-2gywzc86-aa265-5skmk-master-1 pod/openshift-kube-scheduler-guard-ci-op-2gywzc86-aa265-5skmk-master-1 hmsg/2652c73da5 - reason/ProbeError Readiness probe error: Get "https://10.0.0.7:10259/healthz": dial tcp 10.0.0.7:10259: connect: connection refused result=reject body: From: 08:41:08Z To: 08:41:09Z} In each of the 10 jobs aggregated, 2 to 3 jobs failed with this test. Historically this test passed 100%. But with the past two days test data, the passing rate has dropped to 97% and aggregator started allowing this in the latest payload: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-azure-ovn-upgrade-4.15-micro-release-openshift-release-analysis-aggregator/1732295947339173888 The first payload this started appearing is https://amd64.ocp.releases.ci.openshift.org/releasestream/4.15.0-0.nightly/release/4.15.0-0.nightly-2023-12-05-071627. All the events happened during cluster-operator/kube-scheduler progressing. For comparison, here is a passed job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1731936539870498816 Here is a failed one: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1731936538192777216 They both have the same set of probe error events. For the passing jobs, the frequency is lower than 20, while for the failed job, one of those events repeated more than 20 times and therefore results in the test failure.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
- clones
-
OCPBUGS-24537 pathological events test failed multiple times for ns/openshift-kube-scheduler
- Closed
- is blocked by
-
OCPBUGS-24537 pathological events test failed multiple times for ns/openshift-kube-scheduler
- Closed
- links to
-
RHSA-2023:7198 OpenShift Container Platform 4.15 security update