Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: 4.21.0
Affects Version/s: 4.20
Component/s: Monitoring
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:

4.20.0
Target Version:

4.21.0
Release Blocker:
Approved
Sprint:
MON Sprint 276, MON Sprint 277
sprint_count:
2

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
In Progress
Release Note Type:
Release Note Not Required
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

(Feel free to update this bug's summary to be more specific.)
Component Readiness has found a potential regression in the following test:

[sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]

Test has a 94.50% pass rate, but 95.00% is required.

Sample (being evaluated) Release: 4.20
Start Time: 2025-08-27T00:00:00Z
End Time: 2025-09-03T08:00:00Z
Success Rate: 94.50%
Successes: 103
Failures: 6
Flakes: 0
Base (historical) Release: 4.19
Start Time: 2025-05-18T00:00:00Z
End Time: 2025-06-17T23:59:59Z
Success Rate: 0.00%
Successes: 0
Failures: 0
Flakes: 0

View the test details report for additional context.

Above link is for metal but we also see this test below the required 95% on vsphere.

The failure always looks similar to:
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-serial-virtualmedia-2of2/1962616253063368704

[sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial] expand_less 	13m46s
{  fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:143]: Expected
    <[]error | len:4, cap:4>: [
        <*fmt.wrapError | 0xc000ef8080>{
            msg: "the scrape url https://192.168.111.26:10250/metrics for pod kube-system/ is accessible without authorization: context deadline exceeded",
            err: <context.deadlineExceededError>{},
        },
        <*fmt.wrapError | 0xc000ef8060>{
            msg: "the scrape url https://192.168.111.26:10250/metrics/cadvisor for pod kube-system/ is accessible without authorization: context deadline exceeded",
            err: <context.deadlineExceededError>{},
        },
        <*fmt.wrapError | 0xc000ce8020>{
            msg: "the scrape url https://192.168.111.26:10250/metrics/probes for pod kube-system/ is accessible without authorization: context deadline exceeded",
            err: <context.deadlineExceededError>{},
        },
        <*fmt.wrapError | 0xc000dd2020>{
            msg: "the scrape url https://192.168.111.26:9637/metrics for pod kube-system/ is accessible without authorization: context deadline exceeded",
            err: <context.deadlineExceededError>{},
        },
    ]
to be empty}

The fact the test is interpreting context deadline exceeded as "accessible without auth" may indicate a logic problem with the test.

In this case, the kube-system/ no pod name is suspicious.

In other runs, it seems to be complaining about other pods such as in https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.20-e2e-vsphere-ovn-serial/1960598585972101120

[sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial] expand_less 	13m24s
{  fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:143]: Expected
    <[]error | len:4, cap:4>: [
        <*fmt.wrapError | 0xc002ce6020>{
            msg: "the scrape url https://10.93.152.111:9001/metrics for pod openshift-machine-config-operator/machine-config-daemon-m4j6c is accessible without authorization: context deadline exceeded",
            err: <context.deadlineExceededError>{},
        },
        <*fmt.wrapError | 0xc002ce6040>{
            msg: "the scrape url https://10.93.152.111:9100/metrics for pod openshift-monitoring/node-exporter-ts9kq is accessible without authorization: context deadline exceeded",
            err: <context.deadlineExceededError>{},
        },
        <*fmt.wrapError | 0xc002ce6080>{
            msg: "the scrape url https://10.93.152.111:9103/metrics for pod openshift-ovn-kubernetes/ovnkube-node-7j7z4 is accessible without authorization: context deadline exceeded",
            err: <context.deadlineExceededError>{},
        },
        <*fmt.wrapError | 0xc00119e040>{
            msg: "the scrape url https://10.93.152.111:9105/metrics for pod openshift-ovn-kubernetes/ovnkube-node-7j7z4 is accessible without authorization: context deadline exceeded",
            err: <context.deadlineExceededError>{},
        },
    ]
to be empty}

The pods referenced here seem to not exist in the final artifacts collected from the cluster, which makes me wonder if it's being influenced by the tests which add nodes temporarily.

Some jobs get a mixture of those missing pods, and the kube-system errors.

Filed by: dgoodwin@redhat.com

blocks

OCPBUGS-61540 [Monitoring] New prometheus targets auth test failing too often

Verified

is cloned by

OCPBUGS-61540 [Monitoring] New prometheus targets auth test failing too often

Verified

links to

openshift/origin#30219: OCPBUGS-61193: chore(extended/prometheus): make 'targets auth' test more lenient and more resilient.

openshift/origin#30256: OCPBUGS-61193: chore(extended/prometheus): 2/2: make 'targets auth' test more lenient and more resilient

Assignee:: Ayoub Mrini

Reporter:: OpenShift Technical Release Team

Need Info From:: None

Contributors:: None

QA Contact:: Tai Gao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2025/09/03 12:04 PM

Updated:: 2025/09/19 6:08 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide