[LOG-2286] Prometheus can't watch pods/endpoints/services in openshift-logging namespace when only the CLO is deployed. - Red Hat Issue Tracker

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: Logging 5.4.0
Affects Version/s: Logging 5.4.0
Component/s: Log Collection
Labels:
- devel_ack+
- rn-done-resolved

Blocked:
False
Ready:
False
Docs QE Status:
NEW
QE Status:
VERIFIED
Release Note Text:

Hide
Before this change, the cluster-logging-operator utilized cluster scoped roles and bindings to establish permissions for the prometheus service account to scrape metrics. These permissions were only created when deploying the Operator using the console interface but was missing when deploying from the command line. This fixes that issue by making this role and binding namespace scoped.

Show
Before this change, the cluster-logging-operator utilized cluster scoped roles and bindings to establish permissions for the prometheus service account to scrape metrics. These permissions were only created when deploying the Operator using the console interface but was missing when deploying from the command line. This fixes that issue by making this role and binding namespace scoped.
Market:

Sprint:
Logging (Core) - Sprint 216

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

When only deploy CLO, the prometheus keeps reporting below errors:

ts=2022-03-01T05:30:18.294Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:449: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-logging\""
ts=2022-03-01T05:30:18.296Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:447: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-logging\""
ts=2022-03-01T05:30:18.301Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:448: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-logging\""

I checked the roles and clusterroles created when deploying CLO, no one grants the permission to system:serviceaccount:openshift-monitoring:prometheus-k8s.

Version-Release number of selected component (if applicable):

cluster-logging.5.4.0-75

How reproducible:

Always

Steps to Reproduce:
1. deploy CLO
2. check pod logs in openshift-monitoring/prometheus-k8s-0
3.

Actual results:

Expected results:

Should not see above errors when the CLO is deployed.

Additional info:

I checked the resources created when deploying EO, there has role/prometheus and and rolebinding/prometheus created in openshift-operators-redhat project, it granted below permission to system:serviceaccount:openshift-monitoring:prometheus-k8s:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
  creationTimestamp: "2022-03-01T05:29:57Z"
  labels:
    name: elasticsearch-operator
  name: prometheus
  namespace: openshift-operators-redhat
  resourceVersion: "89682"
  uid: 5e4521ac-6060-4d58-be7e-74e327f53e07
rules:
- apiGroups:
  - ""
  resources:
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch 

$ oc get rolebinding -n openshift-operators-redhat prometheus -oyaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
  creationTimestamp: "2022-03-01T05:29:57Z"
  labels:
    name: elasticsearch-operator
  name: prometheus
  namespace: openshift-operators-redhat
  resourceVersion: "89665"
  uid: 63ea32c5-9bad-40b9-af08-e4acb15f4df9
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: openshift-monitoring

Besides, I found the clusterrole/elasticsearch-metrics and clusterrolebinding/elasticsearch-metrics were created after deploying elasticsearch pods, then the prometheus-k8s pod stopped reporting above error.

I also tried to only deploy collector pods, but the clusterrole/elasticsearch-metrics wasn't created.

is blocked by

LOG-1713 Reduce Permissions granted for prometheus-k8s service account

Closed

is cloned by

LOG-2287 Prometheus can't watch pods/endpoints/services in openshift-logging namespace when only the CLO is deployed.

Closed

links to

openshift/cluster-logging-operator#1396: [release-5.4] LOG-2286: fix prometheus roles to be namespaced scoped

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates