[OCPBUGS-23252] UWM prometheus pods not starting due to variable not being defined.

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.11.z
Component/s: Monitoring
Labels:
- prometheus
- userworkload

Severity:
Moderate
Regression:
No
Story Points:
5
Sprint:
MON Sprint 245
sprint_count:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
* Previously, the `config-reloader` for Prometheus for user-defined projects would fail if unset environment variables were used in the `ServiceMonitor` configuration, which resulted in Prometheus pods failing. With this release, the reloader no longer fails when an unset environment variable is encountered. Instead, unset environment variables are left as they are, while set environment variables are expanded as usual. Any expansion errors, suppressed or otherwise, can be tracked through the `reloader_config_environment_variable_expansion_errors` variable. (link:https://issues.redhat.com/browse/OCPBUGS-23252[*~~OCPBUGS-23252~~*])

Show
* Previously, the `config-reloader` for Prometheus for user-defined projects would fail if unset environment variables were used in the `ServiceMonitor` configuration, which resulted in Prometheus pods failing. With this release, the reloader no longer fails when an unset environment variable is encountered. Instead, unset environment variables are left as they are, while set environment variables are expanded as usual. Any expansion errors, suppressed or otherwise, can be tracked through the `reloader_config_environment_variable_expansion_errors` variable. (link: https://issues.redhat.com/browse/OCPBUGS-23252 [* OCPBUGS-23252 *])
Release Note Type:
Bug Fix
Release Note Status:
Done
Target Version:

4.18.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

UWM prometheus pods not starting due to variable not being defined

Version-Release number of selected component (if applicable):

All RHOCP 4 version

How reproducible:

Steps to Reproduce:

1. Install a cluster 
2. Enable UserWorkload monitoring on the cluster
3. Define a service monitor as mentioned in the attached file

Actual results:

Prometheus pods under UWM fails to start due to unwanted variable

Expected results:

Pods to start correctly

Additional info:h4. Description of problem:

UWM prometheus pods not starting due to variable not being defined

Version-Release number of selected component (if applicable):

All RHOCP 4 version

How reproducible:

Steps to Reproduce:

1. Install a cluster 
2. Enable UserWorkload monitoring on the cluster
3. Define a service monitor as mentioned in the attached file

Actual results:

Prometheus pods under UWM fails to start due to unwanted variable

Expected results:

Pods to start correctly

Additional info:

links to

Downstream update to prometheus-operator 0.76.0

KCS

Midstream PR (p-o): don't fail on envvar expansion errors

RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update

Upstream PR: don't fail on envvar expansion errors

Errata Tool added a comment - 2025/02/25 4:40 AM

Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

For information on the advisory (Important: OpenShift Container Platform 4.18.1 bug fix and security update), and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2024:6122

Errata Tool added a comment - 2025/02/25 4:40 AM Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: OpenShift Container Platform 4.18.1 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:6122

Leonardo Amaral (Inactive) added a comment - 2024/09/11 1:31 PM - edited

Hello!

My OpenShift Clusters are affected in same way of maschulz.openshift got affected - with same log and same serviceMonitoring configuration (Generated by default oc -n openshift-user-workload-monitoring get secret prometheus-user-workload -o jsonpath='{.data.prometheus\.yaml\.gz}' | base64 -d | gunzip > config_out_template.yaml ):

$ oc logs -n openshift-user-workload-monitoring prometheus-user-workload-0 init-config-reloader
level=info ts=2024-09-11T13:26:56.480004423Z caller=main.go:115 msg="Starting prometheus-config-reloader" version="(version=0.67.1, branch=rhaos-4.14-rhel-8, revision=6d398a6)"
level=info ts=2024-09-11T13:26:56.480066869Z caller=main.go:116 build_context="(go=go1.20.12 X:strictfipsruntime, platform=linux/amd64, user=root, date=20240523-02:21:08, tags=strictfipsruntime)"
expand environment variables: found reference to unset environment variable "SERVICE_NAME"

$ grep SERVICE_NAME -B3 config_out_template.yaml
tls_config:
insecure_skip_verify: false
ca_file: /etc/prometheus/certs/configmap_cp4aiops_aiopsedge-openshift-ca-cert_service-ca.crt
server_name: $(SERVICE_NAME).$(SERVICE_NAMESPACE).svc

$ oc version
Client Version: 4.16.5
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.14.30
Kubernetes Version: v1.27.14+7852426

Thanks!

Leonardo Amaral (Inactive) added a comment - 2024/09/11 1:31 PM - edited Hello! My OpenShift Clusters are affected in same way of maschulz.openshift got affected - with same log and same serviceMonitoring configuration (Generated by default oc -n openshift-user-workload-monitoring get secret prometheus-user-workload -o jsonpath='{.data.prometheus\.yaml\.gz}' | base64 -d | gunzip > config_out_template.yaml ): $ oc logs -n openshift-user-workload-monitoring prometheus-user-workload-0 init-config-reloader level=info ts=2024-09-11T13:26:56.480004423Z caller=main.go:115 msg="Starting prometheus-config-reloader" version="(version=0.67.1, branch=rhaos-4.14-rhel-8, revision=6d398a6)" level=info ts=2024-09-11T13:26:56.480066869Z caller=main.go:116 build_context="(go=go1.20.12 X:strictfipsruntime, platform=linux/amd64, user=root, date=20240523-02:21:08, tags=strictfipsruntime)" expand environment variables: found reference to unset environment variable "SERVICE_NAME" $ grep SERVICE_NAME -B3 config_out_template.yaml tls_config: insecure_skip_verify: false ca_file: /etc/prometheus/certs/configmap_cp4aiops_aiopsedge-openshift-ca-cert_service-ca.crt server_name: $(SERVICE_NAME).$(SERVICE_NAMESPACE).svc $ oc version Client Version: 4.16.5 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: 4.14.30 Kubernetes Version: v1.27.14+7852426 Thanks!

Marius Schulz added a comment - 2024/09/04 11:52 AM

Just ran into this issue with a 4.14.20 rosa cluster:
Customer specified a serviceMonitoring containing the following section:

tlsConfig:
      ca:
        configMap:
          key: service-ca.crt
          name: [redacted]
          optional: false
      serverName: $(SERVICE_NAME).$(SERVICE_NAMESPACE).svc

Which is now causing prometheus-user-workload-0 to run into an error during it's initialization phase:

level=info ts=2024-09-04T11:06:45.541115904Z caller=main.go:115 msg="Starting prometheus-config-reloader" version="(version=0.67.1, branch=rhaos-4.14-rhel-8, revision=5ccbcfa)"
level=info ts=2024-09-04T11:06:45.541158656Z caller=main.go:116 build_context="(go=go1.20.12 X:strictfipsruntime, platform=linux/amd64, user=root, date=20240323-02:55:27, tags=strictfipsruntime)"
expand environment variables: found reference to unset environment variable "SERVICE_NAME"

Marius Schulz added a comment - 2024/09/04 11:52 AM Just ran into this issue with a 4.14.20 rosa cluster: Customer specified a serviceMonitoring containing the following section: tlsConfig: ca: configMap: key: service-ca.crt name: [redacted] optional: false serverName: $(SERVICE_NAME).$(SERVICE_NAMESPACE).svc Which is now causing prometheus-user-workload-0 to run into an error during it's initialization phase: level=info ts=2024-09-04T11:06:45.541115904Z caller=main.go:115 msg= "Starting prometheus-config-reloader" version= "(version=0.67.1, branch=rhaos-4.14-rhel-8, revision=5ccbcfa)" level=info ts=2024-09-04T11:06:45.541158656Z caller=main.go:116 build_context= "(go=go1.20.12 X:strictfipsruntime, platform=linux/amd64, user=root, date=20240323-02:55:27, tags=strictfipsruntime)" expand environment variables: found reference to unset environment variable "SERVICE_NAME"

Pranshu Srivastava added a comment - 2024/07/03 9:03 AM

Ah, right. I'll move this to the backlog, then. Once the patch has landed downstream, I'll move this to QA.

Pranshu Srivastava added a comment - 2024/07/03 9:03 AM Ah, right. I'll move this to the backlog, then. Once the patch has landed downstream, I'll move this to QA.

Simon Pasquier added a comment - 2024/07/03 8:52 AM

moving back to assigned since the fix isn't in our downstream yet.

Simon Pasquier added a comment - 2024/07/03 8:52 AM moving back to assigned since the fix isn't in our downstream yet.

OpenShift Jira Bot added a comment - 2024/07/03 8:11 AM

Hi prasriva@redhat.com,

Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

OpenShift Jira Bot added a comment - 2024/07/03 8:11 AM Hi prasriva@redhat.com , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

Simon Pasquier added a comment - 2023/12/01 3:26 PM

FYI I've logged an issue upstream: https://github.com/prometheus-operator/prometheus-operator/issues/6136

Simon Pasquier added a comment - 2023/12/01 3:26 PM FYI I've logged an issue upstream: https://github.com/prometheus-operator/prometheus-operator/issues/6136

Pranshu Srivastava added a comment - 2023/11/14 10:40 AM

Slack thread: https://redhat-internal.slack.com/archives/C0VMT03S5/p1699888602671839

Pranshu Srivastava added a comment - 2023/11/14 10:40 AM Slack thread: https://redhat-internal.slack.com/archives/C0VMT03S5/p1699888602671839

Assignee:: Pranshu Srivastava

Reporter:: Sonigra Saurab

QA Contact:: Junqi Zhao

Doc Contact:: Eliska Romanova

Contributors:: Simon Pasquier

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2023/11/14 10:27 AM

Updated:: 2025/02/25 4:40 AM

Resolved:: 2025/02/25 4:40 AM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:h4. Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Errata Tool added a comment - 2025/02/25 4:40 AM

Expand comment: Errata Tool added a comment - 2025/02/25 4:40 AM

Collapse comment: Leonardo Amaral (Inactive) added a comment - 2024/09/11 1:31 PM, Edited by Leonardo Amaral - 2024/09/11 1:45 PM

Expand comment: Leonardo Amaral (Inactive) added a comment - 2024/09/11 1:31 PM, Edited by Leonardo Amaral - 2024/09/11 1:45 PM

Collapse comment: Marius Schulz added a comment - 2024/09/04 11:52 AM

Expand comment: Marius Schulz added a comment - 2024/09/04 11:52 AM

Collapse comment: Pranshu Srivastava added a comment - 2024/07/03 9:03 AM

Expand comment: Pranshu Srivastava added a comment - 2024/07/03 9:03 AM

Collapse comment: Simon Pasquier added a comment - 2024/07/03 8:52 AM

Expand comment: Simon Pasquier added a comment - 2024/07/03 8:52 AM

Collapse comment: OpenShift Jira Bot added a comment - 2024/07/03 8:11 AM

Expand comment: OpenShift Jira Bot added a comment - 2024/07/03 8:11 AM

Collapse comment: Simon Pasquier added a comment - 2023/12/01 3:26 PM

Expand comment: Simon Pasquier added a comment - 2023/12/01 3:26 PM

Collapse comment: Pranshu Srivastava added a comment - 2023/11/14 10:40 AM

Expand comment: Pranshu Srivastava added a comment - 2023/11/14 10:40 AM

People

Dates