-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.14.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
No
-
None
-
None
-
CNF Network Sprint 257
-
1
-
Done
-
Release Note Not Required
-
None
-
None
-
None
-
None
-
None
This is a clone of issue OCPBUGS-29664. The following is the description of the original issue:
—
Description of problem:
Created Net-attach-def with 2 IPs in range. After that created deployment with 2 replicas using that net-attach-def. Whereabouts daemoneset is created also cronjob is enable reconsiling at every one min. When i poweroff the node one which one of pod is deployded gracefully(poweroff)/ungracefully(poweroff --force) new pod is getting created on healthy node and stuck in container creating state
Version-Release number of selected component (if applicable):
4.14.11
How reproducible:
- Create whereabout daemon set with help of [documentation]([https://docs.openshift.com/container-platform/4.14/networking/multiple_networks/configuring-additional-network.html#nw-multus-creating-whereabouts-reconciler-daemon-set_configuring-additional-network)] - Update the reconciler_cron_expression to: "*/1 * * * *" - Create net-attach-def with 2 IPs in range - Create deployment with 2 replicas - Powreoff the node on which on of the POD is running - New Pod spawned on new healthy node with Container Creating in status.
Steps to Reproduce:
1. On fresh cluster with version 4.14.11
2. Create whereabout daemon set with help of documentation
3. Update the reconciler_cron_expression to: "*/1 * * * *"
$ oc create configmap whereabouts-config -n openshift-multus --from-literal=reconciler_cron_expression="*/1 * * * *"
4. Create new project
$ oc new-project nadtesting
5. Apply below nad.yaml
$ cat nad.yaml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: macvlan-net-attach1
spec:
config: '{
"cniVersion": "0.3.1",
"type": "macvlan",
"master": "br-ex",
"mode": "bridge",
"ipam": {
"type": "whereabouts",
"datastore": "kubernetes",
"range": "172.17.20.0/24",
"range_start": "172.17.20.11",
"range_end": "172.17.20.12"
}
}'
6. Create deployment using net-attach-def with two replica,
$ cat naddeployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment1
labels:
app: macvlan1
spec:
replicas: 2
selector:
matchLabels:
app: macvlan1
template:
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: macvlan-net-attach1
labels:
app: macvlan1
spec:
containers:
- name: google
image: gcr.io/google-samples/kubernetes-bootcamp:v1
ports:
- containerPort: 8080
7. Two Pod will be created
$ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment1-fbfdf5cbc-d6sgr 1/1 Running 0 15m 10.129.2.9 ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2 <none> <none>
deployment1-fbfdf5cbc-njkpz 1/1 Running 0 15m 10.128.2.16 ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh <none> <none>
8. Power off the node using debug
$ oc debug node/ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh
# chroot /host
# shutdown
9. Wait for sometime new pod will created on healthy node which stuck in containercreating
$ oc get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment1-fbfdf5cbc-6cb8d 0/1 ContainerCreating 0 9m53s <none> ci-ln-xvfy762-c1627-h7xzk-worker-0-blzlk <none> <none>
deployment1-fbfdf5cbc-d6sgr 1/1 Running 0 28m 10.129.2.9 ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2 <none> <none>
deployment1-fbfdf5cbc-njkpz 1/1 Terminating 0 28m 10.128.2.16 ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh <none> <none>
10. Node status just for reference,
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ci-ln-xvfy762-c1627-h7xzk-master-0 Ready control-plane,master 59m v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-master-1 Ready control-plane,master 59m v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-master-2 Ready control-plane,master 58m v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh NotReady worker 43m v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-worker-0-blzlk Ready worker 43m v1.27.10+28ed2d7
ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2 Ready worker 43m v1.27.10+28ed2d
Actual results:
Shutdown node's pod stuck in terminating state and not releasing IP. New Pod is stuck in container creating status.
Expected results:
New Pod should start smoothly on new-node.
Additional info:
- Just for information : If i follow manual approach the this issue will resolve for that i need to follow this step
1. remove that termination IP from overlapping
$ oc delete overlappingrangeipreservations.whereabouts.cni.cncf.io <IP>
2. remove that termination IP from ippools.whereabouts.cni.cncf.io
$ oc edit ippools.whereabouts.cni.cncf.io <IP Pool>
Remove that stale IP from list
Also, the whereabouts-reconciler logs on the Terminating pod's node report:
2024-02-19T10:48:00Z [debug] Added IP 172.17.20.12 for pod nadtesting/deployment1-fbfdf5cbc-njkpz
2024-02-19T10:48:00Z [debug] the IP reservation: IP: 172.17.20.12 is reserved for pod: nadtesting/deployment1-fbfdf5cbc-njkpz
2024-02-19T10:48:00Z [debug] pod reference nadtesting/deployment1-fbfdf5cbc-njkpz matches allocation; Allocation IP: 172.17.20.12; PodIPs: map[172.17.20.12:{}]
2024-02-19T10:48:00Z [debug] no IP addresses to cleanup
2024-02-19T10:48:00Z [verbose] reconciler success
i.e. it fails to recognize the need to remove the allocation.
- blocks
-
OCPBUGS-37813 When node shutdown, the Pod whereabouts IP cannot be released (for a stateless application)
-
- Closed
-
- clones
-
OCPBUGS-29664 When node shutdown, the Pod whereabouts IP cannot be released (for a stateless application)
-
- Closed
-
- is blocked by
-
OCPBUGS-29664 When node shutdown, the Pod whereabouts IP cannot be released (for a stateless application)
-
- Closed
-
- is cloned by
-
OCPBUGS-37813 When node shutdown, the Pod whereabouts IP cannot be released (for a stateless application)
-
- Closed
-
- links to
-
RHSA-2024:5107
OpenShift Container Platform 4.16.z security update