-
Bug
-
Resolution: Done-Errata
-
Minor
-
4.14
-
No
-
False
-
Description of problem:
When deploying with external platform, the reported state of the machine config pool is degraded, and we can observe a drift in the configuration: $ diff /etc/mcs-machine-config-content.json ~/rendered-master-1b6aab788192600896f36c5388d48374 < "contents": "[Unit]\nDescription=Kubernetes Kubelet\nWants=rpc-statd.service network-online.target\nRequires=crio.service kubelet-auto-node-size.service\nAfter=network-online.target crio.service kubelet-auto-node-size.service\nAfter=ostree-finalize-staged.service\n\n[Service]\nType=notify\nExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests\nExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state\nExecStartPre=/bin/rm -f /var/lib/kubelet/memory_manager_state\nEnvironmentFile=/etc/os-release\nEnvironmentFile=-/etc/kubernetes/kubelet-workaround\nEnvironmentFile=-/etc/kubernetes/kubelet-env\nEnvironmentFile=/etc/node-sizing.env\n\nExecStart=/usr/local/bin/kubenswrapper \\\n /usr/bin/kubelet \\\n --config=/etc/kubernetes/kubelet.conf \\\n --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \\\n --kubeconfig=/var/lib/kubelet/kubeconfig \\\n --container-runtime-endpoint=/var/run/crio/crio.sock \\\n --runtime-cgroups=/system.slice/crio.service \\\n --node-labels=node-role.kubernetes.io/control-plane,node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \\\n --node-ip=${KUBELET_NODE_IP} \\\n --minimum-container-ttl-duration=6m0s \\\n --cloud-provider=external \\\n --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \\\n \\\n --hostname-override=${KUBELET_NODE_NAME} \\\n --provider-id=${KUBELET_PROVIDERID} \\\n --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \\\n --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bde9fb486f1e8369b465a8c0aff7152c2a1f5a326385ee492140592b506638d6 \\\n --system-reserved=cpu=${SYSTEM_RESERVED_CPU},memory=${SYSTEM_RESERVED_MEMORY},ephemeral-storage=${SYSTEM_RESERVED_ES} \\\n --v=${KUBELET_LOG_LEVEL}\n\nRestart=always\nRestartSec=10\n\n[Install]\nWantedBy=multi-user.target\n", --- > "contents": "[Unit]\nDescription=Kubernetes Kubelet\nWants=rpc-statd.service network-online.target\nRequires=crio.service kubelet-auto-node-size.service\nAfter=network-online.target crio.service kubelet-auto-node-size.service\nAfter=ostree-finalize-staged.service\n\n[Service]\nType=notify\nExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests\nExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state\nExecStartPre=/bin/rm -f /var/lib/kubelet/memory_manager_state\nEnvironmentFile=/etc/os-release\nEnvironmentFile=-/etc/kubernetes/kubelet-workaround\nEnvironmentFile=-/etc/kubernetes/kubelet-env\nEnvironmentFile=/etc/node-sizing.env\n\nExecStart=/usr/local/bin/kubenswrapper \\\n /usr/bin/kubelet \\\n --config=/etc/kubernetes/kubelet.conf \\\n --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \\\n --kubeconfig=/var/lib/kubelet/kubeconfig \\\n --container-runtime-endpoint=/var/run/crio/crio.sock \\\n --runtime-cgroups=/system.slice/crio.service \\\n --node-labels=node-role.kubernetes.io/control-plane,node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \\\n --node-ip=${KUBELET_NODE_IP} \\\n --minimum-container-ttl-duration=6m0s \\\n --cloud-provider= \\\n --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \\\n \\\n --hostname-override=${KUBELET_NODE_NAME} \\\n --provider-id=${KUBELET_PROVIDERID} \\\n --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \\\n --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bde9fb486f1e8369b465a8c0aff7152c2a1f5a326385ee492140592b506638d6 \\\n --system-reserved=cpu=${SYSTEM_RESERVED_CPU},memory=${SYSTEM_RESERVED_MEMORY},ephemeral-storage=${SYSTEM_RESERVED_ES} \\\n --v=${KUBELET_LOG_LEVEL}\n\nRestart=always\nRestartSec=10\n\n[Install]\nWantedBy=multi-user.target\n", the difference is --cloud-provider=external /--cloud-provider= is the flags passed to the kubelet. We also observe the following log in the MCC: W0629 09:57:44.583046 1 warnings.go:70] unknown field "spec.infra.status.platformStatus.external.cloudControllerManager" "spec.infra.status.platformStatus.external.cloudControllerManager" is basically the flag in the Infrastructure object that enables the external platform.
Version-Release number of selected component (if applicable):
4.14 nightly
How reproducible:
Always when platform is external
Steps to Reproduce:
1. Deploy a cluster with the external platform enabled, the featureSet TechPreviewNoUpgrade should be set and the Infrastructure object should look like: apiVersion: config.openshift.io/v1 kind: Infrastructure metadata: creationTimestamp: "2023-06-28T10:37:12Z" generation: 1 name: cluster resourceVersion: "538" uid: 57e09773-0eca-4767-95ce-8ec7d0f2cdae spec: cloudConfig: name: "" platformSpec: external: platformName: oci type: External status: apiServerInternalURI: https://api-int.test-infra-cluster-3cd17632.assisted-ci.oci-rhelcert.edge-sro.rhecoeng.com:6443 apiServerURL: https://api.test-infra-cluster-3cd17632.assisted-ci.oci-rhelcert.edge-sro.rhecoeng.com:6443 controlPlaneTopology: HighlyAvailable cpuPartitioning: None etcdDiscoveryDomain: "" infrastructureName: test-infra-cluster-3c-pqqqm infrastructureTopology: HighlyAvailable platform: External platformStatus: external: cloudControllerManager: state: External type: External 2. Observe the drift with: oc get mcp
Actual results:
$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master False True True 3 0 0 3 138m worker rendered-worker-d48036fe2b657e6c71d5d1275675fefc True False False 3 3 3 0 138m
Expected results:
$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-2ff4e25f807ef3b20b7c6e0c6526f05d True False False 3 3 3 0 33m worker rendered-worker-48b7f39d78e3b1d94a0aba1ef4358d01 True False False 3 3 3 0 33m
Additional info:
https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1688035248716119