-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.21, 4.22
-
None
Description of problem:
During bootstrap, the bootstrap node's Kube API Server receives IPAM create requests but is unable to reach the webhooks in the Cluster API namespace.
This is because the bootstrap node doesn't have a route to the pods as it doesn't have access to the pod networks.
If failurePolicy is set to Fail, the KAS cannot reach the webhook endpoints and the request fails, preventing creation of IPAddress and IPAddressClaim resources.
This causes a chicken-and-egg problem as it prevents IPAM provisioning for the workers which won't start without their IP addresses being allocated.
This started happening after https://github.com/openshift/cluster-api/pull/243 was merged as that PR bumped the manifests-generator (https://github.com/openshift/cluster-api/pull/243/changes/02edd867a4143fcc9b8b041c013adb94b6b1589c#diff-228c56adac5bd636ca3fffc91280b45643758ba559f4288c55b20ac1fcaa5cf6 ) to a version that re-enabled validation webhooks for core CAPI CRDS, which enabled core CAPI CRDs validation in TPNU (https://github.com/openshift/cluster-api/pull/243/changes/b78d91bee8db2655dfbee87d06a9ac543598ed9a#diff-b7f238b6e169da00dad8139e22dd0e56c9fe4b0d912371850843e891e25575bc ) causing the overall failure.
The full context on this issue was captured in a debug Slack channel here: https://redhat.enterprise.slack.com/archives/C0A2M43S199
Especially here: https://redhat-internal.slack.com/archives/C0A2M43S199/p1765549194602169?thread_ts=1765540108.488539&cid=C0A2M43S199
Here are the solutions we could think of: https://redhat-internal.slack.com/archives/C0A2M43S199/p1765786907406689?thread_ts=1765540108.488539&cid=C0A2M43S199
We went with solution 1 (remove the hard requirement of webhooks so they are not needed at boostrap):
- https://github.com/openshift/installer/pull/10158 Change to v1beta2 in installer code (when running tech-preview) for ipam related CRs (This is to avoid requiring any conversion webhook ** (which we can't reach))
- https://github.com/openshift/cluster-api/pull/256 Set the failurePolicy for validatingwebhookconfigurations for IPAM to Ignore (potentially implement them via VAPs instead (tracked here: https://issues.redhat.com/browse/OCPBUGS-69435 ) )
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
1. launch a vpshere-static installation in TPNU
2. installation fails
3.
Actual results:
Installation fails
Expected results:
Installation should succeed
Additional info:
- is related to
-
SPLAT-2584 update installer to generate v1beta2 version of IPAM
-
- Backlog
-
- relates to
-
OCPBUGS-69435 Core CAPI IPAM: add VAPs to fill in for validating webhook's failurepolicy: Ignore
-
- New
-
- links to