Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-2287

Prometheus can't watch pods/endpoints/services in openshift-logging namespace when only the CLO is deployed.

XMLWordPrintable

    • False
    • False
    • NEW
    • VERIFIED
    • Logging (Core) - Sprint 216, Logging (Core) - Sprint 217

      Description of problem:

      When only deploy CLO, the prometheus keeps reporting below errors:

      ts=2022-03-01T05:30:18.294Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:449: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-logging\""
      ts=2022-03-01T05:30:18.296Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:447: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-logging\""
      ts=2022-03-01T05:30:18.301Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:448: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-logging\""

      I checked the roles and clusterroles created when deploying CLO, no one grants the permission to system:serviceaccount:openshift-monitoring:prometheus-k8s.

      After deploying EFK pods, the prometheus-k8s pod stopped reporting error, the targets serviceMonitor/openshift-logging/collector and serviceMonitor/openshift-logging/monitor-elasticsearch-cluster could be found in prometheus-k8s console, however, the target serviceMonitor/openshift-logging/cluster-logging-operator-metrics-monitor still didn't appear.

      Version-Release number of selected component (if applicable):

      cluster-logging.5.3.5-21

      How reproducible:

      Always

      Steps to Reproduce:
      1. deploy CLO
      2. check pod logs in openshift-monitoring/prometheus-k8s-0
      3.

      Actual results:

      Expected results:

      Should not see above errors when the CLO is deployed, and the target serviceMonitor/openshift-logging/cluster-logging-operator-metrics-monitor can be found in prometheus-k8s console.

      Additional info:

      I checked the resources created when deploying EO, there has role/prometheus and and rolebinding/prometheus created in openshift-operators-redhat project, it granted below permission to system:serviceaccount:openshift-monitoring:prometheus-k8s:

      apiVersion: rbac.authorization.k8s.io/v1
      kind: Role
      metadata:
        annotations:
          include.release.openshift.io/self-managed-high-availability: "true"
          include.release.openshift.io/single-node-developer: "true"
        creationTimestamp: "2022-03-01T05:29:57Z"
        labels:
          name: elasticsearch-operator
        name: prometheus
        namespace: openshift-operators-redhat
        resourceVersion: "89682"
        uid: 5e4521ac-6060-4d58-be7e-74e327f53e07
      rules:
      - apiGroups:
        - ""
        resources:
        - services
        - endpoints
        - pods
        verbs:
        - get
        - list
        - watch 
      
      $ oc get rolebinding -n openshift-operators-redhat prometheus -oyaml
      apiVersion: rbac.authorization.k8s.io/v1
      kind: RoleBinding
      metadata:
        annotations:
          include.release.openshift.io/self-managed-high-availability: "true"
          include.release.openshift.io/single-node-developer: "true"
        creationTimestamp: "2022-03-01T05:29:57Z"
        labels:
          name: elasticsearch-operator
        name: prometheus
        namespace: openshift-operators-redhat
        resourceVersion: "89665"
        uid: 63ea32c5-9bad-40b9-af08-e4acb15f4df9
      roleRef:
        apiGroup: rbac.authorization.k8s.io
        kind: Role
        name: prometheus
      subjects:
      - kind: ServiceAccount
        name: prometheus-k8s
        namespace: openshift-monitoring

       

      Besides, I found the clusterrole/elasticsearch-metrics and clusterrolebinding/elasticsearch-metrics were created after deploying elasticsearch pods, then the prometheus-k8s pod stopped reporting above error.

      I also tried to only deploy collector pods, but the clusterrole/elasticsearch-metrics wasn't created.

              jcantril@redhat.com Jeffrey Cantrill
              qitang@redhat.com Qiaoling Tang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: