Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-2286

Prometheus can't watch pods/endpoints/services in openshift-logging namespace when only the CLO is deployed.

XMLWordPrintable

    • False
    • False
    • NEW
    • VERIFIED
    • Hide
      Before this change, the cluster-logging-operator utilized cluster scoped roles and bindings to establish permissions for the prometheus service account to scrape metrics. These permissions were only created when deploying the Operator using the console interface but was missing when deploying from the command line. This fixes that issue by making this role and binding namespace scoped.
      Show
      Before this change, the cluster-logging-operator utilized cluster scoped roles and bindings to establish permissions for the prometheus service account to scrape metrics. These permissions were only created when deploying the Operator using the console interface but was missing when deploying from the command line. This fixes that issue by making this role and binding namespace scoped.
    • Logging (Core) - Sprint 216

      Description of problem:

      When only deploy CLO, the prometheus keeps reporting below errors:

      ts=2022-03-01T05:30:18.294Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:449: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"openshift-logging\""
      ts=2022-03-01T05:30:18.296Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:447: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-logging\""
      ts=2022-03-01T05:30:18.301Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:448: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"openshift-logging\""

      I checked the roles and clusterroles created when deploying CLO, no one grants the permission to system:serviceaccount:openshift-monitoring:prometheus-k8s.

       

      Version-Release number of selected component (if applicable):

      cluster-logging.5.4.0-75 

      How reproducible:

      Always

      Steps to Reproduce:
      1. deploy CLO
      2. check pod logs in openshift-monitoring/prometheus-k8s-0
      3.

      Actual results:

      Expected results:

      Should not see above errors when the CLO is deployed.

      Additional info:

      I checked the resources created when deploying EO, there has role/prometheus and and rolebinding/prometheus created in openshift-operators-redhat project, it granted below permission to system:serviceaccount:openshift-monitoring:prometheus-k8s:

      apiVersion: rbac.authorization.k8s.io/v1
      kind: Role
      metadata:
        annotations:
          include.release.openshift.io/self-managed-high-availability: "true"
          include.release.openshift.io/single-node-developer: "true"
        creationTimestamp: "2022-03-01T05:29:57Z"
        labels:
          name: elasticsearch-operator
        name: prometheus
        namespace: openshift-operators-redhat
        resourceVersion: "89682"
        uid: 5e4521ac-6060-4d58-be7e-74e327f53e07
      rules:
      - apiGroups:
        - ""
        resources:
        - services
        - endpoints
        - pods
        verbs:
        - get
        - list
        - watch 
      
      $ oc get rolebinding -n openshift-operators-redhat prometheus -oyaml
      apiVersion: rbac.authorization.k8s.io/v1
      kind: RoleBinding
      metadata:
        annotations:
          include.release.openshift.io/self-managed-high-availability: "true"
          include.release.openshift.io/single-node-developer: "true"
        creationTimestamp: "2022-03-01T05:29:57Z"
        labels:
          name: elasticsearch-operator
        name: prometheus
        namespace: openshift-operators-redhat
        resourceVersion: "89665"
        uid: 63ea32c5-9bad-40b9-af08-e4acb15f4df9
      roleRef:
        apiGroup: rbac.authorization.k8s.io
        kind: Role
        name: prometheus
      subjects:
      - kind: ServiceAccount
        name: prometheus-k8s
        namespace: openshift-monitoring

       

      Besides, I found the clusterrole/elasticsearch-metrics and clusterrolebinding/elasticsearch-metrics were created after deploying elasticsearch pods, then the prometheus-k8s pod stopped reporting above error.

      I also tried to only deploy collector pods, but the clusterrole/elasticsearch-metrics wasn't created.

            jcantril@redhat.com Jeffrey Cantrill
            qitang@redhat.com Qiaoling Tang
            Qiaoling Tang Qiaoling Tang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: