Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4686

Removal of detection of host kubelet kubeconfig breaks IBM Cloud ROKS

    XMLWordPrintable

Details

    • Critical
    • SDN Sprint 229
    • 1
    • Approved
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      This PR: https://github.com/openshift/cluster-network-operator/pull/1612/files removed the fallback logic of checking for the hosts kubeconfig file when apiserver-url.env was not populated on the machine. In IBM Cloud ROKS (both public cloud + Satellite (Hypershift)) this file is not populated. This means that any upgrade to 4.12 will result in the cluster network operator failing and cause impacts to the cluster.
      
      I am proposing the following plan: First, this PR is held till 4.13. Second: IBM Cloud ROKS team will ensure from the initial release of 4.12 that this file is populated in it's entire fleet of workers (4.12 and beyond). Holding this to 4.13 will allow a seamless upgrade experience when the user upgrades the control plane to 4.12 but the workers are still 4.11. Then when the user goes to upgrade to 4.13: their workers will all be at 4.12 which is guarenteed to have this file and the logic to remove the check for the host kubeconfig can be removed.
      
      For full disclosure was brought up that we could go and push a daemonset across our entire fleet of 16000+ ROKS clusters that just lays down the file but that still introduces race conditions with the network-operator and results in significant resource increase of cluster workload across our entire fleet that the plan I proposed above would remove
      
      Example on a ROKS on Satellite worker showing that this file does not exist (yet): 
      [root@tyler-test-24 ~]# ls /etc/kubernetes/apiserver-url.env
      ls: cannot access '/etc/kubernetes/apiserver-url.env': No such file or directory

      Version-Release number of selected component (if applicable):

       

      How reproducible:

       

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

       

      Attachments

        Issue Links

          Activity

            People

              pdiak@redhat.com Patryk Diak
              lisowskiibm Tyler Lisowski
              Zhanqi Zhao Zhanqi Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: