Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-48821

bootstrap removes the bootstrap API prematurely

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.19
    • kube-apiserver
    • None
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      bootstrap API server should be terminated only after API is HA, we should wait for API to be available on at least 2 master nodes, these are the steps:

        1. API is HA (api is available on 2+ master nodes)
        1. delete the bootstrap kube-apiserver manifests
        1. wait for the bootstrap API to be down
        1. delete all other static manifests
        1. mark the bootstrap process done

      We should note the difference between a) the bootstrap node itself existing, and b) API being available on the bootstrap node. Today inside the cluster bootstrap, we remove the bootstrap API (b) as soon as two master nodes appear. This is what happens today on the bootstrap node:
      a) create the static assets
      b) wait for 2 master nodes to appear
      c) remove the kube-apiserver from the bootstrap node
      d) mark the bootstrap process as completed

      But we already might have a time window where API is not available [starting from c, and until api is available on a master node].

       

      cluster bootstrap executable is invoked here:
      https://github.com/openshift/installer/blob/c534bb90b780ae488bc6ef7901e0f3f6273e2764/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L541
      start --tear-down-early=false --asset-dir=/assets --required-pods="${REQUIRED_PODS}"
       

      Then, cluster bootstrap removes the bootstrap API here: https://github.com/openshift/cluster-bootstrap/blob/bcd73a12a957ce3821bdfc0920751b8e3528dc98/pkg/start/start.go#L203-L209

      https://github.com/openshift/cluster-bootstrap/blob/bcd73a12a957ce3821bdfc0920751b8e3528dc98/pkg/start/bootstrap.go#L124-L141

       

      but the wait for API to be HA is done here: https://github.com/openshift/installer/blob/c534bb90b780ae488bc6ef7901e0f3f6273e2764/data/data/bootstrap/files/usr/local/bin/report-progress.sh#L24

      The wait should happen from within cluster-bootstrap, this PR moves the wait before cluster bootstrap tears down the bootstrap API/control plane

       

              akashem@redhat.com Abu H Kashem
              akashem@redhat.com Abu H Kashem
              Ke Wang Ke Wang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: