Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-69434

[TechPreviewNoUpgrade] OCP vsphere cluster configured with static ips fails installation

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      During bootstrap, the bootstrap node's Kube API Server receives IPAM create requests but is unable to reach the webhooks in the Cluster API namespace.

      This is because the bootstrap node doesn't have a route to the pods as it doesn't have access to the pod networks.

      If failurePolicy is set to Fail, the KAS cannot reach the webhook endpoints and the request fails, preventing creation of IPAddress and IPAddressClaim resources.

      This causes a chicken-and-egg problem as it prevents IPAM provisioning for the workers which won't start without their IP addresses being allocated.

      This started happening after https://github.com/openshift/cluster-api/pull/243  was merged as that PR bumped the manifests-generator (https://github.com/openshift/cluster-api/pull/243/changes/02edd867a4143fcc9b8b041c013adb94b6b1589c#diff-228c56adac5bd636ca3fffc91280b45643758ba559f4288c55b20ac1fcaa5cf6 ) to a version that re-enabled validation webhooks for core CAPI CRDS, which enabled core CAPI CRDs validation in TPNU (https://github.com/openshift/cluster-api/pull/243/changes/b78d91bee8db2655dfbee87d06a9ac543598ed9a#diff-b7f238b6e169da00dad8139e22dd0e56c9fe4b0d912371850843e891e25575bc ) causing the overall failure.

      The full context on this issue was captured in a debug Slack channel here: https://redhat.enterprise.slack.com/archives/C0A2M43S199 

      Especially here: https://redhat-internal.slack.com/archives/C0A2M43S199/p1765549194602169?thread_ts=1765540108.488539&cid=C0A2M43S199 

      Here are the solutions we could think of: https://redhat-internal.slack.com/archives/C0A2M43S199/p1765786907406689?thread_ts=1765540108.488539&cid=C0A2M43S199 

      We went with solution 1 (remove the hard requirement of webhooks so they are not needed at boostrap):

       

       

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          Always

      Steps to Reproduce:

          1. launch a vpshere-static installation in TPNU
          2. installation fails
          3.
          

      Actual results:

          Installation fails

      Expected results:

          Installation should succeed

      Additional info:

          

              ddonati@redhat.com Damiano Donati
              ddonati@redhat.com Damiano Donati
              None
              None
              Zhaohua Sun Zhaohua Sun
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: