-
Bug
-
Resolution: Unresolved
-
Critical
-
4.22.0
-
None
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
-
Proposed
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
After scaling out a node, the node is missing the kubelet configuration directory. The setup is managed by GitOps ZTP.
ssh core@192.168.112.19 "journalctl -u kubelet" Jan 19 19:28:41 appworker-1.blueprint-cwl.nokia-stamp705.bos2.lab kubenswrapper[16815]: E0119 19:28:41.976024 16815 run.go:72] "command failed" err="failed to merge kubelet configs: failed to walk through kubelet dropin directory \"/etc/openshift/kubelet.conf.d\": lstat /etc/openshift/kubelet.conf.d: no such file or directory"
Version-Release number of selected component (if applicable):
OCP: 4.22.0-ec.0 ACM: 2.16.0-126 MCE: 2.11.0-155
How reproducible:
Reproducible each time. I've reproduced this on worker, gateway, and storage nodes.
Steps to Reproduce:
Follow ACM documentation for worker node scale in and scale out:
https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_management_for_kubernetes/2.15/html-single/multicluster_engine_operator_with_red_hat_advanced_cluster_management/index#scale-add-annotation
1. Scale in worker node by adding annotation and pruneManifests to git repo.
2. Verify node is scaled in successfully from hub and spoke cluster
3. Re-add worker node to git repo to trigger scale out.
4. Worker node is not seen in "oc get nodes"
Checking kubelet logs on worker node states that kubelet configuration directory is not created.
Actual results:
appworker-1 is the node being scaled in/out
[root@Nokia-Rack705-Jumphost ~]# oc get nodes NAME STATUS ROLES AGE VERSION appworker-0.blueprint-cwl.nokia-stamp705.bos2.lab Ready appworker,appworker-mcp-a,worker 4d1h v1.34.2 appworker-2.blueprint-cwl.nokia-stamp705.bos2.lab Ready appworker,appworker-mcp-b,worker 4d1h v1.34.2 appworker-3.blueprint-cwl.nokia-stamp705.bos2.lab Ready appworker,appworker-mcp-b,worker 4d1h v1.34.2 gateway-0.blueprint-cwl.nokia-stamp705.bos2.lab Ready gateway,gateway-mcp-a,worker 4d1h v1.34.2 gateway-1.blueprint-cwl.nokia-stamp705.bos2.lab Ready gateway,gateway-mcp-a,worker 4d1h v1.34.2 master-0.blueprint-cwl.nokia-stamp705.bos2.lab Ready control-plane,master,monitor 4d1h v1.34.2 master-1.blueprint-cwl.nokia-stamp705.bos2.lab Ready control-plane,master,monitor 4d1h v1.34.2 master-2.blueprint-cwl.nokia-stamp705.bos2.lab Ready control-plane,master,monitor 4d1h v1.34.2 storage-0.blueprint-cwl.nokia-stamp705.bos2.lab Ready storage,worker 4d1h v1.34.2 storage-1.blueprint-cwl.nokia-stamp705.bos2.lab Ready storage,worker 4d1h v1.34.2 storage-2.blueprint-cwl.nokia-stamp705.bos2.lab Ready storage,worker 4d1h v1.34.2 storage-3.blueprint-cwl.nokia-stamp705.bos2.lab Ready storage,worker 4d1h v1.34.2 [root@Nokia-Rack705-Jumphost ~]# oc get bmh -A NAMESPACE NAME STATE CONSUMER ONLINE ERROR AGE openshift-machine-api appworker-0.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-bkk2f true 4d1h openshift-machine-api appworker-1.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-appworker-1.blueprint-cwl.nokia-stamp705.bos2.lab true 22m openshift-machine-api appworker-2.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-d7lpq true 4d1h openshift-machine-api appworker-3.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-ds89x true 4d1h openshift-machine-api gateway-0.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-k542z true 4d1h openshift-machine-api gateway-1.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-l4sv2 true 4d1h openshift-machine-api master-0.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-master-0 true 4d1h openshift-machine-api master-1.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-master-1 true 4d1h openshift-machine-api master-2.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-master-2 true 4d1h openshift-machine-api storage-0.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-pqg4l true 4d1h openshift-machine-api storage-1.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-r8qbx true 4d1h openshift-machine-api storage-2.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-rrn69 true 4d1h openshift-machine-api storage-3.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-sd9f2 true 4d1h
Expected results:
Node gets scaled out successfully
Additional info:
Creating the kubelet configuration directory allows the node to become provisioned but it is not getting the correct machineset configuration.
I've reproduced the issue on appworker-1, gateway-1, and storage-0.
ssh core@<node_ip> "sudo mkdir -p /etc/openshift/kubelet.conf.d && sudo systemctl restart kubelet" # from spoke [root@Nokia-Rack705-Jumphost ~]# oc get bmh -A NAMESPACE NAME STATE CONSUMER ONLINE ERROR AGE openshift-machine-api appworker-0.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-bkk2f true 5d6h openshift-machine-api appworker-1.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-appworker-1.blueprint-cwl.nokia-stamp705.bos2.lab true 28h openshift-machine-api appworker-2.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-d7lpq true 5d6h openshift-machine-api appworker-3.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-ds89x true 5d6h openshift-machine-api gateway-0.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-k542z true 5d6h openshift-machine-api gateway-1.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-gateway-1.blueprint-cwl.nokia-stamp705.bos2.lab true 3h47m openshift-machine-api master-0.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-master-0 true 5d6h openshift-machine-api master-1.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-master-1 true 5d6h openshift-machine-api master-2.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-master-2 true 5d6h openshift-machine-api storage-0.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-storage-0.blueprint-cwl.nokia-stamp705.bos2.lab true 116m openshift-machine-api storage-1.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-r8qbx true 5d6h openshift-machine-api storage-2.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-rrn69 true 5d6h openshift-machine-api storage-3.blueprint-cwl.nokia-stamp705.bos2.lab unmanaged blueprint-cwl-zfv6r-worker-0-sd9f2 true 5d6h # from hub [root@Nokia-Rack705-Jumphost ~]# oc get bmh -A NAMESPACE NAME STATE CONSUMER ONLINE ERROR AGE blueprint-cwl appworker-0.blueprint-cwl.nokia-stamp705.bos2.lab provisioned true 5d7h blueprint-cwl appworker-1.blueprint-cwl.nokia-stamp705.bos2.lab provisioned true 30h blueprint-cwl appworker-2.blueprint-cwl.nokia-stamp705.bos2.lab provisioned true 5d7h blueprint-cwl appworker-3.blueprint-cwl.nokia-stamp705.bos2.lab provisioned true 5d7h blueprint-cwl gateway-0.blueprint-cwl.nokia-stamp705.bos2.lab provisioned true 5d7h blueprint-cwl gateway-1.blueprint-cwl.nokia-stamp705.bos2.lab provisioned true 5h24m blueprint-cwl master-0.blueprint-cwl.nokia-stamp705.bos2.lab provisioned true 5d7h blueprint-cwl master-1.blueprint-cwl.nokia-stamp705.bos2.lab provisioned true 5d7h blueprint-cwl master-2.blueprint-cwl.nokia-stamp705.bos2.lab provisioned true 5d7h blueprint-cwl storage-0.blueprint-cwl.nokia-stamp705.bos2.lab provisioned true 3h31m blueprint-cwl storage-1.blueprint-cwl.nokia-stamp705.bos2.lab provisioned true 5d7h blueprint-cwl storage-2.blueprint-cwl.nokia-stamp705.bos2.lab provisioned true 5d7h blueprint-cwl storage-3.blueprint-cwl.nokia-stamp705.bos2.lab provisioned true 5d7h openshift-machine-api master-0 unmanaged hubcluster-hp-czvc5-master-0 true 6d22h openshift-machine-api master-1 unmanaged hubcluster-hp-czvc5-master-1 true 6d22h openshift-machine-api master-2 unmanaged hubcluster-hp-czvc5-master-2 true 6d22h
- links to