Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: Logging 5.2.10
Affects Version/s: Logging 5.2.2
Component/s: Log Collection
Labels:
- devel_ack+
- rn-done-resolved

Blocked:
False
Ready:
False
Docs QE Status:
NEW
QE Status:
VERIFIED
Release Note Text:

Hide
Before this change, the cluster-logging-operator utilized cluster scoped roles and bindings to establish permissions for the prometheus service account to scrape metrics. These permissions were only created when deploying the Operator using the console interface but was missing when deploying from the command line. This fixes that issue by making this role and binding namespace scoped.

Show
Before this change, the cluster-logging-operator utilized cluster scoped roles and bindings to establish permissions for the prometheus service account to scrape metrics. These permissions were only created when deploying the Operator using the console interface but was missing when deploying from the command line. This fixes that issue by making this role and binding namespace scoped.
Market:

Sprint:
Logging (Core) - Sprint 211, Logging (Core) - Sprint 216, Logging (Core) - Sprint 217

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

This was originally opened as a bug against Monitoring: https://bugzilla.redhat.com/show_bug.cgi?id=2021342

Monitoring team moved it to Logging component but as the issue is on Logging 5.2 I am moving this to JIRA. Initial problem description reported below, followed by copies of comments from Monitoring team.

----------

OpenShift 4.7.34

Openshift Logging: cluster-logging.5.2.2-21

Description of problem:
Getting message, "Prometheus could not scrape fluentd for more than 10m."

How reproducible:
Unconfirmed

Additional info:
Customer set label openshift.io/cluster-monitoring: "true" set but still that error is not clearing.

The prometheus pods are noting this error on repeat:

2021-10-31T03:05:06.385693354Z level=error ts=2021-10-31T03:05:06.385Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:428: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-logging\""
2021-10-31T03:05:08.607296440Z level=error ts=2021-10-31T03:05:08.607Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:427: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-logging\""
2021-10-31T03:05:31.197590776Z level=error ts=2021-10-31T03:05:31.197Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:426: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-logging\""

We found a similar bug from an older version:
https://bugzilla.redhat.com/show_bug.cgi?id=1774907
Using diagnostic steps from that bug:

token=`oc -n openshift-monitoring sa get-token prometheus-k8s`
oc auth can-i list endpoints -n openshift-logging --token $token
oc auth can-i list endpoints -n openshift-logging --token $token
oc auth can-i list endpoints -n openshift-logging --token $token
oc auth can-i list endpoints -n openshift-logging --token $token
oc auth can-i list endpoints -n openshift-logging --token $token
oc auth can-i list endpoints -n openshift-logging --token $token

These all result "no". I suspect something has failed to set the proper rolebindings for prometheus-k8s. Are there roles that should be added? Can they be added manually?

----------

Arunprasad Rajkumar 2021-11-09 06:35:35 UTC

Other cluster operators(e.g. cluster-etcd-operator] defines explicit role[1] bindings[2] to the `prometheus-k8s` service account. You may need to follow the same.

But I'm wondering why it was not done from cluster-logging operator!

[1] https://github.com/openshift/cluster-etcd-operator/blob/master/manifests/0000_90_etcd-operator_01_prometheusrole.yaml
[2] https://github.com/openshift/cluster-etcd-operator/blob/master/manifests/0000_90_etcd-operator_02_prometheusrolebinding.yaml

----------

Arunprasad Rajkumar 2021-11-09 07:52:24 UTC

It seems cluster-logging-operator has the necessary role[1] binding[2] to the `prometheus-k8s` service account.

[1] https://github.com/openshift/cluster-logging-operator/blob/release-4.7/manifests/4.7/0100_clusterroles.yaml
[2] https://github.com/openshift/cluster-logging-operator/blob/release-4.7/manifests/4.7/0110_clusterrolebindings.yaml

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

screenshot-1.png
143 kB
2021/12/08 9:34 PM
cr
4 kB
2022/03/10 8:39 AM
image-2022-03-10-14-15-53-935.png
16 kB
2022/03/10 7:15 PM

is related to

LOG-2287 Prometheus can't watch pods/endpoints/services in openshift-logging namespace when only the CLO is deployed.

Closed

relates to

LOG-1713 Reduce Permissions granted for prometheus-k8s service account

Closed

links to

openshift/cluster-logging-operator#1413: [release-5.2] LOG-1972: fix prometheus roles to be namespaced scoped

Assignee:: Jeffrey Cantrill

Reporter:: Steven Walter

QA Contact:: Giriyamma Karagere Ramaswamy (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2021/11/17 6:02 PM

Updated:: 2025/08/08 4:08 PM

Resolved:: 2022/04/19 2:30 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates