-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.15, 4.16
-
+
-
Important
-
No
-
MCO Sprint 254
-
1
-
Rejected
-
False
-
-
-
Description of problem:
When we configure a cloudCA it takes 10 to 15 minutes to write the file in the nodes. We have seen this behaviour in clusters with no enabled capabilities (for example: periodic-ci-openshift-openshift-tests-private-release-4.16-multi-nightly-aws-upi-baselinecaps-none-amd-f28-destructive) $ oc get clusterversion -o yaml .... capabilities: enabledCapabilities: - CloudCredential knownCapabilities: - Build - CSISnapshot - CloudCredential - Console - DeploymentConfig - ImageRegistry - Insights - MachineAPI - NodeTuning - OperatorLifecycleManager - Storage - baremetal - marketplace - openshift-samples
Version-Release number of selected component (if applicable):
]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.16.0-0.nightly-2024-02-09-073541 True False 23m Cluster version is 4.16.0-0.nightly-2024-02-09-073541
How reproducible:
Very often. It can happen that once in a while the file is added correctly, but if we remove the file from the nodes manually it will be reproduced. It is rare that the cloudCA cert is correctly added, but it can happen.
Steps to Reproduce:
1. Install a cluster with no capabilites We have seen this behaviour in prow jobs: periodic-ci-openshift-openshift-tests-private-release-4.16-multi-nightly-aws-upi-baselinecaps-none-amd-f28-destructive We have seen in flexy-install clusters installed with these options: TEMPLATE: private-templates/functionality-testing/aos-4_16/upi-on-gcp/versioned-installer LAUNCHER_VARS: installer_payload_image: registry.ci.openshift.org/ocp/release:4.16.0-0.nightly-2024-02-09-073541 baselineCapabilitySet: None additionalEnabledCapabilities: ["CloudCredential"] disable_worker_machineset: "yes" launch_extra_worker_num: 3 2. Add a cloudCA certificate to the cluster $ openssl genrsa -out privateKey.pem 4096 $ openssl req -new -x509 -nodes -days 3600 -key privateKey.pem -out ca-bundle.crt -subj "/OU=MCO qe/CN=example.com" $ oc set data -n openshift-config ConfigMap cloud-provider-config --from-file=ca-bundle.pem=ca-bundle.crt 3. Wait for the certificate to be writen in the nodes $ oc debug -q node/$(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") -- chroot /host cat "/etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem"
Actual results:
it will take 10 to 15 minutes to write the file in the nodes.
Expected results:
10-15 minutes is too much time to syn controllerconfig and write the files in the nodes, the file should be created earlier.
Additional info:
If we increase the verbosity of the logs, we can see this message in the MCDs: I0208 16:47:50.760738 61728 certificate_writer.go:73] Error syncing ControllerConfig machine-config-controller (retries 0): open /etc/docker/certs.d: no such file or directory I0208 16:47:50.760752 61728 daemon.go:2186] Updating Node ip-10-0-51-14.ec2.internal I0208 16:47:50.765933 61728 certificate_writer.go:79] Started syncing ControllerConfig "machine-config-controller" (2024-02-08 16:47:50.765924397 +0000 UTC m=+60.414594060) I0208 16:47:50.768956 61728 certificate_writer.go:81] Finished syncing ControllerConfig "machine-config-controller" (3.021865ms) It is likely related to https://issues.redhat.com/browse/OCPBUGS-20152 and it will likely be fixed too when OCPBUGS-20152 is fixed. Nevertheless, we need to verify it before closing this issue to make sure that it is like that.
- is related to
-
OCPBUGS-20152 Nodes being marked degraded due to /etc/docker/certs.d not being found
- Closed
-
OCPBUGS-33418 Investigate timing issues in machine-config-controller
- Closed
- relates to
-
OCPBUGS-33412 Nodes being marked degraded due to /etc/docker/certs.d not being found
- Closed