Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-23252

UWM prometheus pods not starting due to variable not being defined.

    • Moderate
    • No
    • 5
    • MON Sprint 245
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the `config-reloader` for Prometheus for user-defined projects would fail if unset environment variables were used in the `ServiceMonitor` configuration, which resulted in Prometheus pods failing. With this release, the reloader no longer fails when an unset environment variable is encountered. Instead, unset environment variables are left as they are, while set environment variables are expanded as usual. Any expansion errors, suppressed or otherwise, can be tracked through the `reloader_config_environment_variable_expansion_errors` variable. (link:https://issues.redhat.com/browse/OCPBUGS-23252[*OCPBUGS-23252*])
      Show
      * Previously, the `config-reloader` for Prometheus for user-defined projects would fail if unset environment variables were used in the `ServiceMonitor` configuration, which resulted in Prometheus pods failing. With this release, the reloader no longer fails when an unset environment variable is encountered. Instead, unset environment variables are left as they are, while set environment variables are expanded as usual. Any expansion errors, suppressed or otherwise, can be tracked through the `reloader_config_environment_variable_expansion_errors` variable. (link: https://issues.redhat.com/browse/OCPBUGS-23252 [* OCPBUGS-23252 *])
    • Bug Fix
    • Done

      Description of problem:

      UWM prometheus pods not starting due to variable not being defined
      

      Version-Release number of selected component (if applicable):

      All RHOCP 4 version
      

      How reproducible:

      Steps to Reproduce:

      1. Install a cluster 
      2. Enable UserWorkload monitoring on the cluster
      3. Define a service monitor as mentioned in the attached file
      

      Actual results:

      Prometheus pods under UWM fails to start due to unwanted variable
      

      Expected results:

      Pods to start correctly
      

      Additional info:h4. Description of problem:

      UWM prometheus pods not starting due to variable not being defined
      

      Version-Release number of selected component (if applicable):

      All RHOCP 4 version
      

      How reproducible:

      Steps to Reproduce:

      1. Install a cluster 
      2. Enable UserWorkload monitoring on the cluster
      3. Define a service monitor as mentioned in the attached file
      

      Actual results:

      Prometheus pods under UWM fails to start due to unwanted variable
      

      Expected results:

      Pods to start correctly
      

      Additional info:

            [OCPBUGS-23252] UWM prometheus pods not starting due to variable not being defined.

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Important: OpenShift Container Platform 4.18.1 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:6122

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: OpenShift Container Platform 4.18.1 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:6122

            Hello!

            My OpenShift Clusters are affected in same way of maschulz.openshift got affected - with same log and same serviceMonitoring configuration (Generated by default oc -n openshift-user-workload-monitoring get secret prometheus-user-workload -o jsonpath='{.data.prometheus\.yaml\.gz}' | base64 -d | gunzip > config_out_template.yaml ):

             

            $ oc logs -n openshift-user-workload-monitoring prometheus-user-workload-0 init-config-reloader
            level=info ts=2024-09-11T13:26:56.480004423Z caller=main.go:115 msg="Starting prometheus-config-reloader" version="(version=0.67.1, branch=rhaos-4.14-rhel-8, revision=6d398a6)"
            level=info ts=2024-09-11T13:26:56.480066869Z caller=main.go:116 build_context="(go=go1.20.12 X:strictfipsruntime, platform=linux/amd64, user=root, date=20240523-02:21:08, tags=strictfipsruntime)"
            expand environment variables: found reference to unset environment variable "SERVICE_NAME"

            $ grep SERVICE_NAME -B3  config_out_template.yaml
              tls_config:
                insecure_skip_verify: false
                ca_file: /etc/prometheus/certs/configmap_cp4aiops_aiopsedge-openshift-ca-cert_service-ca.crt
                server_name: $(SERVICE_NAME).$(SERVICE_NAMESPACE).svc

            $ oc version
            Client Version: 4.16.5
            Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
            Server Version: 4.14.30
            Kubernetes Version: v1.27.14+7852426

             

            Thanks!

            Leonardo Amaral (Inactive) added a comment - - edited Hello! My OpenShift Clusters are affected in same way of maschulz.openshift got affected - with same log and same serviceMonitoring configuration (Generated by default oc -n openshift-user-workload-monitoring get secret prometheus-user-workload -o jsonpath='{.data.prometheus\.yaml\.gz}' | base64 -d | gunzip > config_out_template.yaml ):   $ oc logs -n openshift-user-workload-monitoring prometheus-user-workload-0 init-config-reloader level=info ts=2024-09-11T13:26:56.480004423Z caller=main.go:115 msg="Starting prometheus-config-reloader" version="(version=0.67.1, branch=rhaos-4.14-rhel-8, revision=6d398a6)" level=info ts=2024-09-11T13:26:56.480066869Z caller=main.go:116 build_context="(go=go1.20.12 X:strictfipsruntime, platform=linux/amd64, user=root, date=20240523-02:21:08, tags=strictfipsruntime)" expand environment variables: found reference to unset environment variable "SERVICE_NAME" $ grep SERVICE_NAME -B3  config_out_template.yaml   tls_config:     insecure_skip_verify: false     ca_file: /etc/prometheus/certs/configmap_cp4aiops_aiopsedge-openshift-ca-cert_service-ca.crt     server_name: $(SERVICE_NAME).$(SERVICE_NAMESPACE).svc $ oc version Client Version: 4.16.5 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: 4.14.30 Kubernetes Version: v1.27.14+7852426   Thanks!

            Just ran into this issue with a 4.14.20 rosa cluster:
            Customer specified a serviceMonitoring containing the following section:

            tlsConfig:
                  ca:
                    configMap:
                      key: service-ca.crt
                      name: [redacted]
                      optional: false
                  serverName: $(SERVICE_NAME).$(SERVICE_NAMESPACE).svc
            

            Which is now causing prometheus-user-workload-0 to run into an error during it's initialization phase:

            level=info ts=2024-09-04T11:06:45.541115904Z caller=main.go:115 msg="Starting prometheus-config-reloader" version="(version=0.67.1, branch=rhaos-4.14-rhel-8, revision=5ccbcfa)"
            level=info ts=2024-09-04T11:06:45.541158656Z caller=main.go:116 build_context="(go=go1.20.12 X:strictfipsruntime, platform=linux/amd64, user=root, date=20240323-02:55:27, tags=strictfipsruntime)"
            expand environment variables: found reference to unset environment variable "SERVICE_NAME"
            

            Marius Schulz added a comment - Just ran into this issue with a 4.14.20 rosa cluster: Customer specified a serviceMonitoring containing the following section: tlsConfig: ca: configMap: key: service-ca.crt name: [redacted] optional: false serverName: $(SERVICE_NAME).$(SERVICE_NAMESPACE).svc Which is now causing prometheus-user-workload-0 to run into an error during it's initialization phase: level=info ts=2024-09-04T11:06:45.541115904Z caller=main.go:115 msg= "Starting prometheus-config-reloader" version= "(version=0.67.1, branch=rhaos-4.14-rhel-8, revision=5ccbcfa)" level=info ts=2024-09-04T11:06:45.541158656Z caller=main.go:116 build_context= "(go=go1.20.12 X:strictfipsruntime, platform=linux/amd64, user=root, date=20240323-02:55:27, tags=strictfipsruntime)" expand environment variables: found reference to unset environment variable "SERVICE_NAME"

            Ah, right. I'll move this to the backlog, then. Once the patch has landed downstream, I'll move this to QA.

            Pranshu Srivastava added a comment - Ah, right. I'll move this to the backlog, then. Once the patch has landed downstream, I'll move this to QA.

            moving back to assigned since the fix isn't in our downstream yet.

            Simon Pasquier added a comment - moving back to assigned since the fix isn't in our downstream yet.

            Hi prasriva@redhat.com,

            Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            OpenShift Jira Bot added a comment - Hi prasriva@redhat.com , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            Simon Pasquier added a comment - FYI I've logged an issue upstream: https://github.com/prometheus-operator/prometheus-operator/issues/6136

            Pranshu Srivastava added a comment - Slack thread: https://redhat-internal.slack.com/archives/C0VMT03S5/p1699888602671839

              prasriva@redhat.com Pranshu Srivastava
              rhn-support-ssonigra Sonigra Saurab
              Junqi Zhao Junqi Zhao
              Eliska Romanova Eliska Romanova
              Simon Pasquier
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: