Uploaded image for project: 'Cluster Observability Operator'
  1. Cluster Observability Operator
  2. COO-1511

race condition in creation of prometheus pod and corresponding rolebinding after appying new monitoringStack

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • 1.4.0 RC
    • 1.3.1
    • monitoring-stack
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • None
    • None

      After creating a new monitoringStack it can come to a race condition where the ClusterRoleBinding/RoleBinding is not applied fast enough and the prometheus pod fails to start:

      create Pod prometheus-devops-monitoring-0 in StatefulSet prometheus-devops-monitoring failed error: pods "prometheus-devops-monitoring-0" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider "pipelines-scc": Forbidden: not usable by user or serviceaccount, provider "splunk-otel-collector": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.fsGroup: Invalid value: []int64{65534}: 65534 is not an allowed group, provider restricted-v2: .initContainers[0].runAsUser: Invalid value: 65534: must be in the ranges: [1004790000, 1004799999], provider restricted-v2: .containers[0].runAsUser: Invalid value: 65534: must be in the ranges: [1004790000, 1004799999], provider restricted-v2: .containers[1].runAsUser: Invalid value: 65534: must be in the ranges: [1004790000, 1004799999], provider restricted-v2: .containers[2].runAsUser: Invalid value: 65534: must be in the ranges: [1004790000, 1004799999], provider restricted-v2: .containers[3].runAsUser: Invalid value: 65534: must be in the ranges: [1004790000, 1004799999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid-extimg-importer": Forbidden: not usable by user or serviceaccount, provider "elasticsearch-scc": Forbidden: not usable by user or serviceaccount, provider "logging-scc": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]
      

      We can see in the source code in line 63 the prometheus CR is getting created, but the required ClusterRoleBinding/RoleBinding for Prometheus is getting created afterwards in Line 74/75 during the Alertmanager Deployment.

      A similar issue has been reported in COO-1266. This issue is getting fixed within some seconds, but the events are causing alerting events for customers.

              jfajersk@redhat.com Jan Fajerski
              rhn-support-anowak Andreas Nowak
              None
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: