Loading...

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: Logging 5.3.6
Affects Version/s: Logging 5.3.5
Component/s: Log Collection
Labels:
- devel_ack+

Blocked:
False
Ready:
False
Docs QE Status:
NEW
QE Status:
VERIFIED
Market:

Sprint:
Logging (Core) - Sprint 216, Logging (Core) - Sprint 217

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

When only deploy CLO, the prometheus keeps reporting below errors:

ts=2022-03-01T05:30:18.294Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:449: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-logging\""
ts=2022-03-01T05:30:18.296Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:447: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-logging\""
ts=2022-03-01T05:30:18.301Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:448: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-logging\""

I checked the roles and clusterroles created when deploying CLO, no one grants the permission to system:serviceaccount:openshift-monitoring:prometheus-k8s.

After deploying EFK pods, the prometheus-k8s pod stopped reporting error, the targets serviceMonitor/openshift-logging/collector and serviceMonitor/openshift-logging/monitor-elasticsearch-cluster could be found in prometheus-k8s console, however, the target serviceMonitor/openshift-logging/cluster-logging-operator-metrics-monitor still didn't appear.

Version-Release number of selected component (if applicable):

cluster-logging.5.3.5-21

How reproducible:

Always

Steps to Reproduce:
1. deploy CLO
2. check pod logs in openshift-monitoring/prometheus-k8s-0
3.

Actual results:

Expected results:

Should not see above errors when the CLO is deployed, and the target serviceMonitor/openshift-logging/cluster-logging-operator-metrics-monitor can be found in prometheus-k8s console.

Additional info:

I checked the resources created when deploying EO, there has role/prometheus and and rolebinding/prometheus created in openshift-operators-redhat project, it granted below permission to system:serviceaccount:openshift-monitoring:prometheus-k8s:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
  creationTimestamp: "2022-03-01T05:29:57Z"
  labels:
    name: elasticsearch-operator
  name: prometheus
  namespace: openshift-operators-redhat
  resourceVersion: "89682"
  uid: 5e4521ac-6060-4d58-be7e-74e327f53e07
rules:
- apiGroups:
  - ""
  resources:
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch 

$ oc get rolebinding -n openshift-operators-redhat prometheus -oyaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
  creationTimestamp: "2022-03-01T05:29:57Z"
  labels:
    name: elasticsearch-operator
  name: prometheus
  namespace: openshift-operators-redhat
  resourceVersion: "89665"
  uid: 63ea32c5-9bad-40b9-af08-e4acb15f4df9
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: openshift-monitoring

Besides, I found the clusterrole/elasticsearch-metrics and clusterrolebinding/elasticsearch-metrics were created after deploying elasticsearch pods, then the prometheus-k8s pod stopped reporting above error.

I also tried to only deploy collector pods, but the clusterrole/elasticsearch-metrics wasn't created.

clones

LOG-2286 Prometheus can't watch pods/endpoints/services in openshift-logging namespace when only the CLO is deployed.

Closed

relates to

LOG-1972 Getting message, "Prometheus could not scrape fluentd for more than 10m."

Closed

links to

openshift/cluster-logging-operator#1412: [release-5.3] LOG-2287: fix prometheus roles to be namespaced scoped

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates