Created Net-attach-def with 2 IPs in range. After that created deployment with 2 replicas using that net-attach-def. Whereabouts daemoneset is created also cronjob is enable reconsiling at every one min. When i poweroff the node one which one of pod is deployded gracefully(poweroff)/ungracefully(poweroff --force) new pod is getting created on healthy node and stuck in container creating state
- Create whereabout daemon set with help of [documentation]([https://docs.openshift.com/container-platform/4.14/networking/multiple_networks/configuring-additional-network.html#nw-multus-creating-whereabouts-reconciler-daemon-set_configuring-additional-network)] - Update the reconciler_cron_expression to: "*/1 * * * *" - Create net-attach-def with 2 IPs in range - Create deployment with 2 replicas - Powreoff the node on which on of the POD is running - New Pod spawned on new healthy node with Container Creating in status.
1. On fresh cluster with version 4.14.11 2. Create whereabout daemon set with help of documentation 3. Update the reconciler_cron_expression to: "*/1 * * * *" $ oc create configmap whereabouts-config -n openshift-multus --from-literal=reconciler_cron_expression="*/1 * * * *" 4. Create new project $ oc new-project nadtesting 5. Apply below nad.yaml $ cat nad.yaml apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: macvlan-net-attach1 spec: config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "br-ex", "mode": "bridge", "ipam": { "type": "whereabouts", "datastore": "kubernetes", "range": "", "range_start": "", "range_end": "" } }' 6. Create deployment using net-attach-def with two replica, $ cat naddeployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: deployment1 labels: app: macvlan1 spec: replicas: 2 selector: matchLabels: app: macvlan1 template: metadata: annotations: k8s.v1.cni.cncf.io/networks: macvlan-net-attach1 labels: app: macvlan1 spec: containers: - name: google image: gcr.io/google-samples/kubernetes-bootcamp:v1 ports: - containerPort: 8080 7. Two Pod will be created $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES deployment1-fbfdf5cbc-d6sgr 1/1 Running 0 15m ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2 <none> <none> deployment1-fbfdf5cbc-njkpz 1/1 Running 0 15m ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh <none> <none> 8. Power off the node using debug $ oc debug node/ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh # chroot /host # shutdown 9. Wait for sometime new pod will created on healthy node which stuck in containercreating $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES deployment1-fbfdf5cbc-6cb8d 0/1 ContainerCreating 0 9m53s <none> ci-ln-xvfy762-c1627-h7xzk-worker-0-blzlk <none> <none> deployment1-fbfdf5cbc-d6sgr 1/1 Running 0 28m ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2 <none> <none> deployment1-fbfdf5cbc-njkpz 1/1 Terminating 0 28m ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh <none> <none> 10. Node status just for reference, $ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-xvfy762-c1627-h7xzk-master-0 Ready control-plane,master 59m v1.27.10+28ed2d7 ci-ln-xvfy762-c1627-h7xzk-master-1 Ready control-plane,master 59m v1.27.10+28ed2d7 ci-ln-xvfy762-c1627-h7xzk-master-2 Ready control-plane,master 58m v1.27.10+28ed2d7 ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh NotReady worker 43m v1.27.10+28ed2d7 ci-ln-xvfy762-c1627-h7xzk-worker-0-blzlk Ready worker 43m v1.27.10+28ed2d7 ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2 Ready worker 43m v1.27.10+28ed2d
Actual results:
Shutdown node's pod stuck in terminating state and not releasing IP. New Pod is stuck in container creating status.
Expected results:
New Pod should start smoothly on new-node.
- Just for information : If i follow manual approach the this issue will resolve for that i need to follow this step 1. remove that termination IP from overlapping $ oc delete overlappingrangeipreservations.whereabouts.cni.cncf.io <IP> 2. remove that termination IP from ippools.whereabouts.cni.cncf.io $ oc edit ippools.whereabouts.cni.cncf.io <IP Pool> Remove that stale IP from list Also, the whereabouts-reconciler logs on the Terminating pod's node report: 2024-02-19T10:48:00Z [debug] Added IP for pod nadtesting/deployment1-fbfdf5cbc-njkpz 2024-02-19T10:48:00Z [debug] the IP reservation: IP: is reserved for pod: nadtesting/deployment1-fbfdf5cbc-njkpz 2024-02-19T10:48:00Z [debug] pod reference nadtesting/deployment1-fbfdf5cbc-njkpz matches allocation; Allocation IP:; PodIPs: map[{}] 2024-02-19T10:48:00Z [debug] no IP addresses to cleanup 2024-02-19T10:48:00Z [verbose] reconciler success i.e. it fails to recognize the need to remove the allocation.
