Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: ACM 2.12.2
Affects Version/s: ACM 2.12.0, ACM 2.12.1
Component/s: Observability
Labels:
- triaged

Story Points:
1
Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

Sprint:
Observability Sprint 33, Observability Sprint 34
Severity:
Moderate

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

For some, currently unknown reason, during upgrade from ACM2.11->ACM2.12 we use the proxy image from the base templates. This seem to happen only for a short while, on later reconciles we appear get the correct image from the OCP imagestream.

In disconnected systems, this causes problems due to the image not being available. For rbac-query-proxy and Grafana it eventually resolves with the correct image used, however alertmanager uses a statefulset and gets stuck in the bad state due the known Kubernetes issue described here: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#forced-rollback . This results in the alertmanager-2 pod being left in crashloop and the two other pods (alertmanager-1 and alertmanager-0) to use the 2.11 image versions. Scaling the statefulset down to n-1 resolves the problem (and automatically scales the statefulset back up to the desired amount, 3 by default).

Version-Release number of selected component (if applicable):

ACM 2.12

How reproducible:

Always

Steps to Reproduce:

Install ACM 2.11 in a disconnected environment and enable observability
Upgrade to ACM 2.12
Alternatively
Don't install in a disconnected environment but keep a close eye during upgrade on the alertmanager-2 pod, and see that it's trying to pull the quay.io/stolostron/origin-oauth-proxy image for the proxy container

Actual results:

Alertmanager-2 pod in crashloop and other pods continuing to use the old images

Expected results:

All alertmanager replicas are healthy using the correct 2.12 images.

Workaround:

oc scale statefulset observability-alertmanager -n open-cluster-management-observability --replicas=2

note: set replicas to n-1 if number of replicas for the statefulset has been changed from the default (default: 3)

is related to

ACM-12951 Get oauth-proxy image from OCP API

Closed

ACM-15419 Upgrading from ACM 2.11 to ACM 2.12 on non-amd64 systems causes alertmanger to crashloop

Closed

links to

RHSA-2024:143822 Red Hat Advanced Cluster Management 2.12.2 bug fixes and container updates

Assignee:: Jacob Baungard Hansen

Reporter:: Jacob Baungard Hansen

QA Contact:: Xiang Yin

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/11/12 9:08 AM

Updated:: 2025/01/30 9:31 PM

Resolved:: 2024/12/03 2:10 PM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Workaround:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates