Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29284

Adding cloudCA certificate is taking too long in clusters with no capabilities enabled

XMLWordPrintable

      Description of problem:

      
      When we configure a cloudCA it takes 10 to 15 minutes to write the file in the nodes.
      
      We have seen this behaviour in clusters with no enabled capabilities (for example: periodic-ci-openshift-openshift-tests-private-release-4.16-multi-nightly-aws-upi-baselinecaps-none-amd-f28-destructive)
      
      $ oc get clusterversion -o yaml 
      ....
          capabilities:
            enabledCapabilities:
            - CloudCredential
            knownCapabilities:
            - Build
            - CSISnapshot
            - CloudCredential
            - Console
            - DeploymentConfig
            - ImageRegistry
            - Insights
            - MachineAPI
            - NodeTuning
            - OperatorLifecycleManager
            - Storage
            - baremetal
            - marketplace
            - openshift-samples
      
      
      
          

      Version-Release number of selected component (if applicable):

      ]$ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.16.0-0.nightly-2024-02-09-073541   True        False         23m     Cluster version is 4.16.0-0.nightly-2024-02-09-073541
      
          

      How reproducible:

      Very often. It can happen that once in a while the file is added correctly, but if we remove the file from the nodes manually it will be reproduced.
      
      It is rare that the cloudCA cert is correctly added, but it can happen.
          

      Steps to Reproduce:

          1. Install a cluster with no capabilites
      
      We have  seen this behaviour in prow jobs:
      periodic-ci-openshift-openshift-tests-private-release-4.16-multi-nightly-aws-upi-baselinecaps-none-amd-f28-destructive
      
      We have seen in flexy-install clusters installed with these options:
      
      TEMPLATE: private-templates/functionality-testing/aos-4_16/upi-on-gcp/versioned-installer
      
      LAUNCHER_VARS:
      installer_payload_image:  registry.ci.openshift.org/ocp/release:4.16.0-0.nightly-2024-02-09-073541
      baselineCapabilitySet: None
      additionalEnabledCapabilities: ["CloudCredential"]
      disable_worker_machineset: "yes"
      launch_extra_worker_num: 3
      
      
          2. Add a cloudCA certificate to the cluster
      
      $ openssl genrsa -out privateKey.pem 4096
      $ openssl req -new -x509 -nodes -days 3600 -key privateKey.pem -out ca-bundle.crt -subj "/OU=MCO qe/CN=example.com"
      $ oc set data -n openshift-config ConfigMap cloud-provider-config  --from-file=ca-bundle.pem=ca-bundle.crt
      
      
          3. Wait for the certificate to be writen in the nodes
      
      $  oc debug -q  node/$(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") -- chroot /host cat "/etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem"
      
      
          

      Actual results:

      it will take 10 to 15 minutes to write the file in the nodes.
          

      Expected results:

      10-15 minutes is too much time to syn controllerconfig and write the files in the nodes, the file should be created earlier.
          

      Additional info:

      
      If we increase the verbosity of the logs, we can see this message in the MCDs:
      
      I0208 16:47:50.760738   61728 certificate_writer.go:73] Error syncing ControllerConfig machine-config-controller (retries 0): open /etc/docker/certs.d: no such file or directory
      I0208 16:47:50.760752   61728 daemon.go:2186] Updating Node ip-10-0-51-14.ec2.internal
      I0208 16:47:50.765933   61728 certificate_writer.go:79] Started syncing ControllerConfig "machine-config-controller" (2024-02-08 16:47:50.765924397 +0000 UTC m=+60.414594060)
      I0208 16:47:50.768956   61728 certificate_writer.go:81] Finished syncing ControllerConfig "machine-config-controller" (3.021865ms)
      
      
      It is likely related to https://issues.redhat.com/browse/OCPBUGS-20152 and it will likely be fixed too when OCPBUGS-20152 is fixed.
      
      Nevertheless, we need to verify it before closing this issue to make sure that it is like that.
      
          

              umohnani Urvashi Mohnani
              sregidor@redhat.com Sergio Regidor de la Rosa
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: