Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-61193

[Monitoring] New prometheus targets auth test failing too often

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • Approved
    • MON Sprint 276, MON Sprint 277
    • 2
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      (Feel free to update this bug's summary to be more specific.)
      Component Readiness has found a potential regression in the following test:

      [sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]

      Test has a 94.50% pass rate, but 95.00% is required.

      Sample (being evaluated) Release: 4.20
      Start Time: 2025-08-27T00:00:00Z
      End Time: 2025-09-03T08:00:00Z
      Success Rate: 94.50%
      Successes: 103
      Failures: 6
      Flakes: 0
      Base (historical) Release: 4.19
      Start Time: 2025-05-18T00:00:00Z
      End Time: 2025-06-17T23:59:59Z
      Success Rate: 0.00%
      Successes: 0
      Failures: 0
      Flakes: 0

      View the test details report for additional context.

      Above link is for metal but we also see this test below the required 95% on vsphere.

      The failure always looks similar to:
      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-serial-virtualmedia-2of2/1962616253063368704

      [sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial] expand_less 	13m46s
      {  fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:143]: Expected
          <[]error | len:4, cap:4>: [
              <*fmt.wrapError | 0xc000ef8080>{
                  msg: "the scrape url https://192.168.111.26:10250/metrics for pod kube-system/ is accessible without authorization: context deadline exceeded",
                  err: <context.deadlineExceededError>{},
              },
              <*fmt.wrapError | 0xc000ef8060>{
                  msg: "the scrape url https://192.168.111.26:10250/metrics/cadvisor for pod kube-system/ is accessible without authorization: context deadline exceeded",
                  err: <context.deadlineExceededError>{},
              },
              <*fmt.wrapError | 0xc000ce8020>{
                  msg: "the scrape url https://192.168.111.26:10250/metrics/probes for pod kube-system/ is accessible without authorization: context deadline exceeded",
                  err: <context.deadlineExceededError>{},
              },
              <*fmt.wrapError | 0xc000dd2020>{
                  msg: "the scrape url https://192.168.111.26:9637/metrics for pod kube-system/ is accessible without authorization: context deadline exceeded",
                  err: <context.deadlineExceededError>{},
              },
          ]
      to be empty}
      

      The fact the test is interpreting context deadline exceeded as "accessible without auth" may indicate a logic problem with the test.

      In this case, the kube-system/ no pod name is suspicious.

      In other runs, it seems to be complaining about other pods such as in https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.20-e2e-vsphere-ovn-serial/1960598585972101120

      [sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial] expand_less 	13m24s
      {  fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:143]: Expected
          <[]error | len:4, cap:4>: [
              <*fmt.wrapError | 0xc002ce6020>{
                  msg: "the scrape url https://10.93.152.111:9001/metrics for pod openshift-machine-config-operator/machine-config-daemon-m4j6c is accessible without authorization: context deadline exceeded",
                  err: <context.deadlineExceededError>{},
              },
              <*fmt.wrapError | 0xc002ce6040>{
                  msg: "the scrape url https://10.93.152.111:9100/metrics for pod openshift-monitoring/node-exporter-ts9kq is accessible without authorization: context deadline exceeded",
                  err: <context.deadlineExceededError>{},
              },
              <*fmt.wrapError | 0xc002ce6080>{
                  msg: "the scrape url https://10.93.152.111:9103/metrics for pod openshift-ovn-kubernetes/ovnkube-node-7j7z4 is accessible without authorization: context deadline exceeded",
                  err: <context.deadlineExceededError>{},
              },
              <*fmt.wrapError | 0xc00119e040>{
                  msg: "the scrape url https://10.93.152.111:9105/metrics for pod openshift-ovn-kubernetes/ovnkube-node-7j7z4 is accessible without authorization: context deadline exceeded",
                  err: <context.deadlineExceededError>{},
              },
          ]
      to be empty}
      

      The pods referenced here seem to not exist in the final artifacts collected from the cluster, which makes me wonder if it's being influenced by the tests which add nodes temporarily.

      Some jobs get a mixture of those missing pods, and the kube-system errors.

      Filed by: dgoodwin@redhat.com

              rh-ee-amrini Ayoub Mrini
              openshift-trt OpenShift Technical Release Team
              None
              None
              Tai Gao Tai Gao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated: