-
Bug
-
Resolution: Done
-
Blocker
-
None
-
Quality / Stability / Reliability
-
3
-
False
-
-
False
-
-
-
-
Installer Sprint 2025-59, Installer Sprint 2025-60, Installer Sprint 2025-61
-
Important
-
None
Description of problem:
Upgrade of ACM from 2.12 to 2.13.2 with SiteConfig operator enabled. A cluster was deployed and managed under 2.12. Following the upgrade:
- The SiteConfig operator pod in open-cluster-management namespace was not recreated (age showed 12d when all other pods were ~3h)
- Editing or deleting the ClusterInstance CR for the previously deployed cluster failed with error about missing webhook, but the webhook exists:
$ oc delete clusterinstance -n cnfdf02 cnfdf02
Error from server (InternalError): Internal error occurred: failed calling webhook "clusterinstances.siteconfig.open-cluster-management.io": failed to call webhook: Post "https://webhook-clusterinstances-siteconfig-open-cluster-management-io.open-cluster-management.svc:443/validate-siteconfig-open-cluster-management-io-v1alpha1-clusterinstance?timeout=10s": no endpoints available for service "webhook-clusterinstances-siteconfig-open-cluster-management-io"
$ oc get svc -n open-cluster-management
<snip>
webhook-clusterinstances-siteconfig-open-cluster-management-io ClusterIP 172.30.110.105 <none> 443/TCP 7h40m
Hub cluster is 3-node cluster. Dual-stack networking w/ ipv4 primary.Version-Release number of selected component (if applicable):
ACM Upgrade from 2.12 to 2.13
How reproducible:
100%
Steps to Reproduce:
- Install ACM 2.12
- enable SiteConfig Operator
- Upgrade to 2.13
- Try to create or delete a ClusterInstance CR
Actual results:
Error "failed calling webhook"
Expected results:
Success in creating/deleting ClusterInstance
Additional info:
https://access.redhat.com/solutions/7116347
Resolution
When upgrading from 2.12.x to 2.13.3, a deployment was updated with a new label selector. Under kubernetes restrictions, this is an immutable field, so the apply/patch failed. In order to actually modify this field, the resource must be deleted and re-created. This update also included a new label, which the webhook service uses in order to target the pod, causing the webhook to fail to call due to no targeted pods. A hyper-specific check was added to delete deployment/siteconfig-controller-manager only when upgrading from 2.12 and if siteconfig is enabled.
To Test
1. Install ACM 2.12.x
2. Enable siteconfig
3. Upgrade to ACM 2.13.3
4. See that siteconfig-controller-manager deployment has the label control-plane: siteconfig-controller-manager (was previously control-plane: controller-manager
5. This should match the service webhook-clusterinstances-siteconfig-open-cluster-management-io which has the label selector control-plane: siteconfig-controller-manager
6. This service is what was throwing the webhook error mentioned in the ticket. Under these conditions, the error no longer throws when attempting to create/edit/delete a ClusterInstance under ACM 2.13.3
- is cloned by
-
ACM-21045 Editing or deleting the ClusterInstance CR for the previously deployed cluster failed with error about missing webhook, but the webhook exists after upgrade ACM 2.12 to 2.13 with site-config enabled
-
- Closed
-
- links to