Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.17
Component/s: OLM
Labels:
- triaged

Regression:
No
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Type:
Release Note Not Required
Release Note Status:
Done
Target Version:

4.17.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

This was discovered by a new alert that was reverted in https://issues.redhat.com/browse/OCPBUGS-36299 as the issue is making Hypershift Conformance fail.

Platform prometheus is asked to scrape targets from the namespace "openshift-operator-lifecycle-manager", but Prometheus isn't given the appropriate RBAC to do so.

The alert was revealing an RBAC issue on platform Prometheus: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-hypershift-release-4.17-periodics-e2e-aws-ovn-conformance/1806305841511403520/artifacts/e2e-aws-ovn-conformance/dump/artifacts/hostedcluster-8a4fd7515fb581e231c4/namespaces/openshift-monitoring/pods/prometheus-k8s-0/prometheus/prometheus/logs/current.log

2024-06-27T14:59:38.968032082Z ts=2024-06-27T14:59:38.967Z caller=klog.go:108 level=warn component=k8s_client_runtime func=Warningf msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:554: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-operator-lifecycle-manager\""
2024-06-27T14:59:38.968032082Z ts=2024-06-27T14:59:38.968Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:554: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-operator-lifecycle-manager\""

Before adding this alert, such issues went unnoticed.

https://docs.google.com/document/d/1rCKAYTrYMESjJDyJ0KvNap05NNukmNXVVi6MShISnOw/edit#heading=h.13rhihr867kk explains what should be done (cf the "Also, in order for Prometheus to be able to discover..." paragraph) in order to make Prometheus able to discover the targets.

Because no test was failing before, maybe the metrics from "openshift-operator-lifecycle-manager" are not needed and we should stop asking Prometheus to discover targets from there: delete the ServiceMonitor/PodMonitor

Expected results:

Prometheus shouldn't be asked to discover targets without providing it with the appropriate RBAC.
We'd like to get the new PrometheusKubernetesListWatchFailures alert back https://github.com/openshift/cluster-monitoring-operator/pull/2403

is triggered by

OCPBUGS-36299 Hypershift Conformance Failing due to PrometheusKubernetesListWatchFailures

Closed

is triggering

OCPBUGS-36689 OLM resources partially deployed

Closed

links to

openshift/operator-framework-olm#831: OCPBUGS-36500: remove cvo hypershift profile annotation from psm-operator manifests

RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update

Assignee:: Ankita Thomas

Reporter:: Ayoub Mrini

QA Contact:: Jian Zhang

Doc Contact:: Alex Dellapenta (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/07/03 2:07 PM

Updated:: 2024/10/01 5:40 PM

Resolved:: 2024/10/01 5:40 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates