-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
-
4.14.z
Description of problem:
Created Net-attach-def with 2 IPs in range. After that created deployment with 2 replicas using that net-attach-def. Whereabouts daemoneset is created also cronjob is enable reconsiling at every one min. When i poweroff the node one which one of pod is deployded gracefully(poweroff)/ungracefully(poweroff --force) new pod is getting created on healthy node and stuck in container creating state
Version-Release number of selected component (if applicable):
4.14.11
How reproducible:
- Create whereabout daemon set with help of [documentation]([https://docs.openshift.com/container-platform/4.14/networking/multiple_networks/configuring-additional-network.html#nw-multus-creating-whereabouts-reconciler-daemon-set_configuring-additional-network)] - Update the reconciler_cron_expression to: "*/1 * * * *" - Create net-attach-def with 2 IPs in range - Create deployment with 2 replicas - Powreoff the node on which on of the POD is running - New Pod spawned on new healthy node with Container Creating in status.
Steps to Reproduce:
1. On fresh cluster with version 4.14.11 2. Create whereabout daemon set with help of documentation 3. Update the reconciler_cron_expression to: "*/1 * * * *" $ oc create configmap whereabouts-config -n openshift-multus --from-literal=reconciler_cron_expression="*/1 * * * *" 4. Create new project $ oc new-project nadtesting 5. Apply below nad.yaml $ cat nad.yaml apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: macvlan-net-attach1 spec: config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "br-ex", "mode": "bridge", "ipam": { "type": "whereabouts", "datastore": "kubernetes", "range": "172.17.20.0/24", "range_start": "172.17.20.11", "range_end": "172.17.20.12" } }' 6. Create deployment using net-attach-def with two replica, $ cat naddeployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: deployment1 labels: app: macvlan1 spec: replicas: 2 selector: matchLabels: app: macvlan1 template: metadata: annotations: k8s.v1.cni.cncf.io/networks: macvlan-net-attach1 labels: app: macvlan1 spec: containers: - name: google image: gcr.io/google-samples/kubernetes-bootcamp:v1 ports: - containerPort: 8080 7. Two Pod will be created $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES deployment1-fbfdf5cbc-d6sgr 1/1 Running 0 15m 10.129.2.9 ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2 <none> <none> deployment1-fbfdf5cbc-njkpz 1/1 Running 0 15m 10.128.2.16 ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh <none> <none> 8. Power off the node using debug $ oc debug node/ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh # chroot /host # shutdown 9. Wait for sometime new pod will created on healthy node which stuck in containercreating $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES deployment1-fbfdf5cbc-6cb8d 0/1 ContainerCreating 0 9m53s <none> ci-ln-xvfy762-c1627-h7xzk-worker-0-blzlk <none> <none> deployment1-fbfdf5cbc-d6sgr 1/1 Running 0 28m 10.129.2.9 ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2 <none> <none> deployment1-fbfdf5cbc-njkpz 1/1 Terminating 0 28m 10.128.2.16 ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh <none> <none> 10. Node status just for reference, $ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-xvfy762-c1627-h7xzk-master-0 Ready control-plane,master 59m v1.27.10+28ed2d7 ci-ln-xvfy762-c1627-h7xzk-master-1 Ready control-plane,master 59m v1.27.10+28ed2d7 ci-ln-xvfy762-c1627-h7xzk-master-2 Ready control-plane,master 58m v1.27.10+28ed2d7 ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh NotReady worker 43m v1.27.10+28ed2d7 ci-ln-xvfy762-c1627-h7xzk-worker-0-blzlk Ready worker 43m v1.27.10+28ed2d7 ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2 Ready worker 43m v1.27.10+28ed2d
Actual results:
Shutdown node's pod stuck in terminating state and not releasing IP. New Pod is stuck in container creating status.
Expected results:
New Pod should start smoothly on new-node.
Additional info:
- Just for information : If i follow manual approach the this issue will resolve for that i need to follow this step 1. remove that termination IP from overlapping $ oc delete overlappingrangeipreservations.whereabouts.cni.cncf.io <IP> 2. remove that termination IP from ippools.whereabouts.cni.cncf.io $ oc edit ippools.whereabouts.cni.cncf.io <IP Pool> Remove that stale IP from list Also, the whereabouts-reconciler logs on the Terminating pod's node report: 2024-02-19T10:48:00Z [debug] Added IP 172.17.20.12 for pod nadtesting/deployment1-fbfdf5cbc-njkpz 2024-02-19T10:48:00Z [debug] the IP reservation: IP: 172.17.20.12 is reserved for pod: nadtesting/deployment1-fbfdf5cbc-njkpz 2024-02-19T10:48:00Z [debug] pod reference nadtesting/deployment1-fbfdf5cbc-njkpz matches allocation; Allocation IP: 172.17.20.12; PodIPs: map[172.17.20.12:{}] 2024-02-19T10:48:00Z [debug] no IP addresses to cleanup 2024-02-19T10:48:00Z [verbose] reconciler success i.e. it fails to recognize the need to remove the allocation.
- blocks
-
OCPBUGS-37707 When node shutdown, the Pod whereabouts IP cannot be released (for a stateless application)
- Closed
- is cloned by
-
OCPBUGS-37707 When node shutdown, the Pod whereabouts IP cannot be released (for a stateless application)
- Closed
- links to
-
RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update