Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29664

When node shutdown, the Pod whereabouts IP cannot be released (for a stateless application)

    • Moderate
    • No
    • CNF Network Sprint 257
    • 1
    • False
    • Hide

      None

      Show
      None
    • Customer Escalated, Customer Facing

      Description of problem:

      Created Net-attach-def with 2 IPs in range. After that created deployment with 2 replicas using that net-attach-def. Whereabouts daemoneset is created also cronjob is enable reconsiling at every one min. 
      When i poweroff the node one which one of pod is deployded gracefully(poweroff)/ungracefully(poweroff --force) new pod is getting created on healthy node and stuck in container creating state

      Version-Release number of selected component (if applicable):

          4.14.11

      How reproducible:

      - Create whereabout daemon set with help of [documentation]([https://docs.openshift.com/container-platform/4.14/networking/multiple_networks/configuring-additional-network.html#nw-multus-creating-whereabouts-reconciler-daemon-set_configuring-additional-network)]
      - Update the reconciler_cron_expression to: "*/1 * * * *"
      - Create net-attach-def with 2 IPs in range
      - Create deployment with 2 replicas
      - Powreoff the node on which on of the POD is running
      - New Pod spawned on new healthy node with Container Creating in status.

      Steps to Reproduce:

      1. On fresh cluster with version 4.14.11
      2. Create whereabout daemon set with help of documentation   
      3. Update the reconciler_cron_expression to: "*/1 * * * *"
      $ oc create configmap whereabouts-config -n openshift-multus --from-literal=reconciler_cron_expression="*/1 * * * *"
      
      4. Create new project
      $ oc new-project nadtesting
      
      5. Apply below nad.yaml
      $ cat nad.yaml 
      apiVersion: "k8s.cni.cncf.io/v1"
      kind: NetworkAttachmentDefinition
      metadata:
        name: macvlan-net-attach1
      spec:
        config: '{
            "cniVersion": "0.3.1",
            "type": "macvlan",
            "master": "br-ex",
            "mode": "bridge",
            "ipam": {
              "type": "whereabouts",
              "datastore": "kubernetes",
              "range": "172.17.20.0/24",
              "range_start": "172.17.20.11",
              "range_end": "172.17.20.12"
            }
          }'
      
      6. Create deployment using net-attach-def with two replica,
      $ cat naddeployment.yaml 
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: deployment1
        labels:
          app: macvlan1
      spec:
        replicas: 2
        selector:
          matchLabels:
            app: macvlan1
        template:
          metadata:
            annotations:
                 k8s.v1.cni.cncf.io/networks: macvlan-net-attach1
            labels:
              app: macvlan1
          spec:
            containers:
            - name: google
              image: gcr.io/google-samples/kubernetes-bootcamp:v1
              ports:
              - containerPort: 8080
      
      7. Two Pod will be created
      $ oc get pods -o wide
      NAME                          READY   STATUS    RESTARTS   AGE   IP            NODE                                       NOMINATED NODE   READINESS GATES
      deployment1-fbfdf5cbc-d6sgr   1/1     Running   0          15m   10.129.2.9    ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2   <none>           <none>
      deployment1-fbfdf5cbc-njkpz   1/1     Running   0          15m   10.128.2.16   ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh   <none>           <none>
      
      8. Power off the node using debug
      $ oc debug node/ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh 
      # chroot /host
      # shutdown
      
      9. Wait for sometime new pod will created on healthy node which stuck in containercreating 
      $ oc get pod -o wide
      NAME                          READY   STATUS              RESTARTS   AGE     IP            NODE                                       NOMINATED NODE   READINESS GATES
      deployment1-fbfdf5cbc-6cb8d   0/1     ContainerCreating   0          9m53s   <none>        ci-ln-xvfy762-c1627-h7xzk-worker-0-blzlk   <none>           <none>
      deployment1-fbfdf5cbc-d6sgr   1/1     Running             0          28m     10.129.2.9    ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2   <none>           <none>
      deployment1-fbfdf5cbc-njkpz   1/1     Terminating         0          28m     10.128.2.16   ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh   <none>           <none>
      
      10. Node status just for reference,
      $ oc get nodes  
      NAME                                       STATUS     ROLES                  AGE   VERSION
      ci-ln-xvfy762-c1627-h7xzk-master-0         Ready      control-plane,master   59m   v1.27.10+28ed2d7
      ci-ln-xvfy762-c1627-h7xzk-master-1         Ready      control-plane,master   59m   v1.27.10+28ed2d7
      ci-ln-xvfy762-c1627-h7xzk-master-2         Ready      control-plane,master   58m   v1.27.10+28ed2d7
      ci-ln-xvfy762-c1627-h7xzk-worker-0-8bdfh   NotReady   worker                 43m   v1.27.10+28ed2d7
      ci-ln-xvfy762-c1627-h7xzk-worker-0-blzlk   Ready      worker                 43m   v1.27.10+28ed2d7
      ci-ln-xvfy762-c1627-h7xzk-worker-0-qvzq2   Ready      worker                 43m   v1.27.10+28ed2d
      
      

      Actual results:

      Shutdown node's pod stuck in terminating state and not releasing IP. New Pod is stuck in container creating status.

      Expected results:

      New Pod should start smoothly on new-node.    

      Additional info:

      - Just for information : If i follow manual approach the this issue will resolve for that i need to follow this step
      1. remove that termination IP from overlapping
      $ oc delete overlappingrangeipreservations.whereabouts.cni.cncf.io <IP>
      
      2. remove that termination IP from ippools.whereabouts.cni.cncf.io
      $ oc edit ippools.whereabouts.cni.cncf.io <IP Pool> 
      Remove that stale IP from list
      
      Also, the whereabouts-reconciler logs on the Terminating pod's node report:
      2024-02-19T10:48:00Z [debug] Added IP 172.17.20.12 for pod nadtesting/deployment1-fbfdf5cbc-njkpz
      2024-02-19T10:48:00Z [debug] the IP reservation: IP: 172.17.20.12 is reserved for pod: nadtesting/deployment1-fbfdf5cbc-njkpz
      2024-02-19T10:48:00Z [debug] pod reference nadtesting/deployment1-fbfdf5cbc-njkpz matches allocation; Allocation IP: 172.17.20.12; PodIPs: map[172.17.20.12:{}]
      2024-02-19T10:48:00Z [debug] no IP addresses to cleanup
      2024-02-19T10:48:00Z [verbose] reconciler success
      
      i.e. it fails to recognize the need to remove the allocation.

       

            [OCPBUGS-29664] When node shutdown, the Pod whereabouts IP cannot be released (for a stateless application)

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Moderate: OpenShift Container Platform 4.17.0 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:3718

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Moderate: OpenShift Container Platform 4.17.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:3718

            Tested and verified in 4.17.0-0.nightly-2024-07-29-061317

            Weibin Liang added a comment - Tested and verified in 4.17.0-0.nightly-2024-07-29-061317

            Hi rh-ee-marguerr,

            Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            OpenShift Jira Bot added a comment - Hi rh-ee-marguerr , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            reopening bug to track fix - https://issues.redhat.com/browse/RFE-5374

            Marcelo Guerrero Viveros added a comment - reopening bug to track fix - https://issues.redhat.com/browse/RFE-5374

            Carlos Goncalves added a comment - - edited

            pliurh's comment is partially correct for graceful node shutdowns.

            Upstream Kubernetes added support for graceful node shutdown. It enables Kubelet to gracefully evict pods during a node shutdown. This feature is beta in Kubernetes 1.21 and enabled by default. See https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown

            However, OpenShift does not support this yet and has it explicitly disabled.

            OpenShift documentation of this feature had been published (TELCODOCS-903) but later unpublished due to known open issues (OCPBUGS-17478).
            There is a KB Solution page on how to enable it, but it should be noted that it is not a supported method to enable and use this feature.

            As for non-graceful node shutdowns, upstream Kubernetes introduced this feature in 1.26 as beta and promoted it to GA in 1.28. OpenShift 4.14 is based on Kubernetes 1.27, and is not supported at this point in time.

            Carlos Goncalves added a comment - - edited pliurh 's comment is partially correct for graceful node shutdowns. Upstream Kubernetes added support for graceful node shutdown. It enables Kubelet to gracefully evict pods during a node shutdown. This feature is beta in Kubernetes 1.21 and enabled by default. See https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown However, OpenShift does not support this yet and has it explicitly disabled. https://issues.redhat.com/browse/RFE-2579 https://issues.redhat.com/browse/OCPNODE-549 https://github.com/openshift/machine-config-operator/pull/4208 OpenShift documentation of this feature had been published (TELCODOCS-903) but later unpublished due to known open issues ( OCPBUGS-17478 ). There is a KB Solution page on how to enable it, but it should be noted that it is not a supported method to enable and use this feature. https://access.redhat.com/solutions/6998877 As for non-graceful node shutdowns, upstream Kubernetes introduced this feature in 1.26 as beta and promoted it to GA in 1.28. OpenShift 4.14 is based on Kubernetes 1.27, and is not supported at this point in time. https://kubernetes.io/docs/concepts/architecture/nodes/#non-graceful-node-shutdown https://kubernetes.io/blog/2022/12/16/kubernetes-1-26-non-graceful-node-shutdown-beta/ https://kubernetes.io/blog/2023/08/16/kubernetes-1-28-non-graceful-node-shutdown-ga/

            I was now able to reproduce on 4.14.0-0.nightly-2024-02-22-202606 with a graceful power off.

            This cluster build includes https://github.com/openshift/cluster-network-operator/pull/2257.

            My previous clusters were provisioned by clusterbot on AWS and turned out that powered-off nodes are automagically removed (and a new one is provisioned within minutes). When nodes are removed, all resources allocated to them are freed up resulting in a free IP in the whereabouts IP pool that is later assigned to the new pod. Powered-off nodes on vSphere and GCP are not removed.

            $ oc get pods -o wide -w
            NAME                          READY   STATUS    RESTARTS   AGE     IP            NODE                                       NOMINATED NODE   READINESS GATES
            deployment1-fbfdf5cbc-qsf2n   1/1     Running   0          4m59s   10.131.0.21   ci-ln-x3wh0qk-72292-vqklf-worker-a-xkvzn   <none>           <none>
            deployment1-fbfdf5cbc-qxwj7   1/1     Running   0          4m59s   10.128.2.18   ci-ln-x3wh0qk-72292-vqklf-worker-b-fmchd   <none>           <none>
            deployment1-fbfdf5cbc-qsf2n   1/1     Running   0          7m32s   10.131.0.21   ci-ln-x3wh0qk-72292-vqklf-worker-a-xkvzn   <none>           <none>
            deployment1-fbfdf5cbc-qsf2n   1/1     Terminating   0          7m32s   10.131.0.21   ci-ln-x3wh0qk-72292-vqklf-worker-a-xkvzn   <none>           <none>
            deployment1-fbfdf5cbc-kjdpl   0/1     Pending       0          0s      <none>        <none>                                     <none>           <none>
            deployment1-fbfdf5cbc-kjdpl   0/1     Pending       0          0s      <none>        ci-ln-x3wh0qk-72292-vqklf-worker-c-sd42q   <none>           <none>
            deployment1-fbfdf5cbc-kjdpl   0/1     Pending       0          0s      <none>        ci-ln-x3wh0qk-72292-vqklf-worker-c-sd42q   <none>           <none>
            deployment1-fbfdf5cbc-kjdpl   0/1     ContainerCreating   0          0s      <none>        ci-ln-x3wh0qk-72292-vqklf-worker-c-sd42q   <none>           <none>
            
            $ oc describe ippools -n openshift-multus172.17.20.0-24 -n openshift-multus
            Name:         172.17.20.0-24
            Namespace:    openshift-multus
            Labels:       <none>
            Annotations:  <none>
            API Version:  whereabouts.cni.cncf.io/v1alpha1
            Kind:         IPPool
            Metadata:
              Creation Timestamp:  2024-02-28T14:18:05Z
              Generation:          3
              Resource Version:    39593
              UID:                 e11c8ec1-2823-4d44-b2c8-400b8c6a54b4
            Spec:
              Allocations:
                11:
                  Id:      f5f481a1ff576efe358e44238bd7f9aefbd9ed7e57b855d5c84362bf521f5d81
                  Podref:  nadtesting/deployment1-fbfdf5cbc-qsf2n
                12:
                  Id:      07e8e278c96640288896ce55d956ae69b5b041919bde7d8c6bc8cf101d58ad97
                  Podref:  nadtesting/deployment1-fbfdf5cbc-qxwj7
              Range:       172.17.20.0/24
            Events:        <none> 

            Carlos Goncalves added a comment - I was now able to reproduce on 4.14.0-0.nightly-2024-02-22-202606 with a graceful power off. This cluster build includes https://github.com/openshift/cluster-network-operator/pull/2257 . My previous clusters were provisioned by clusterbot on AWS and turned out that powered-off nodes are automagically removed (and a new one is provisioned within minutes). When nodes are removed, all resources allocated to them are freed up resulting in a free IP in the whereabouts IP pool that is later assigned to the new pod. Powered-off nodes on vSphere and GCP are not removed. $ oc get pods -o wide -w NAME                          READY   STATUS    RESTARTS   AGE     IP            NODE                                       NOMINATED NODE   READINESS GATES deployment1-fbfdf5cbc-qsf2n   1/1     Running   0          4m59s   10.131.0.21   ci-ln-x3wh0qk-72292-vqklf-worker-a-xkvzn   <none>           <none> deployment1-fbfdf5cbc-qxwj7   1/1     Running   0          4m59s   10.128.2.18   ci-ln-x3wh0qk-72292-vqklf-worker-b-fmchd   <none>           <none> deployment1-fbfdf5cbc-qsf2n   1/1     Running   0          7m32s   10.131.0.21   ci-ln-x3wh0qk-72292-vqklf-worker-a-xkvzn   <none>           <none> deployment1-fbfdf5cbc-qsf2n   1/1     Terminating   0          7m32s   10.131.0.21   ci-ln-x3wh0qk-72292-vqklf-worker-a-xkvzn   <none>           <none> deployment1-fbfdf5cbc-kjdpl   0/1     Pending       0          0s      <none>        <none>                                     <none>           <none> deployment1-fbfdf5cbc-kjdpl   0/1     Pending       0          0s      <none>        ci-ln-x3wh0qk-72292-vqklf-worker-c-sd42q   <none>           <none> deployment1-fbfdf5cbc-kjdpl   0/1     Pending       0          0s      <none>        ci-ln-x3wh0qk-72292-vqklf-worker-c-sd42q   <none>           <none> deployment1-fbfdf5cbc-kjdpl   0/1     ContainerCreating   0          0s      <none>        ci-ln-x3wh0qk-72292-vqklf-worker-c-sd42q   <none>           <none> $ oc describe ippools -n openshift-multus172.17.20.0-24 -n openshift-multus Name:         172.17.20.0-24 Namespace:    openshift-multus Labels:       <none> Annotations:  <none> API Version:  whereabouts.cni.cncf.io/v1alpha1 Kind:         IPPool Metadata:   Creation Timestamp:  2024-02-28T14:18:05Z   Generation:          3   Resource Version:    39593   UID:                 e11c8ec1-2823-4d44-b2c8-400b8c6a54b4 Spec:   Allocations:     11:       Id:      f5f481a1ff576efe358e44238bd7f9aefbd9ed7e57b855d5c84362bf521f5d81       Podref:  nadtesting/deployment1-fbfdf5cbc-qsf2n     12:       Id:      07e8e278c96640288896ce55d956ae69b5b041919bde7d8c6bc8cf101d58ad97       Podref:  nadtesting/deployment1-fbfdf5cbc-qxwj7   Range:       172.17.20.0/24 Events:        <none>

            Deployed 4.14.11 (previously deployed nightly 4.14). Still cannot reproduce the issue on graceful or ungraceful power off.

            Graceful power off:

            $ oc get clusterversion
            NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
            version   4.14.11   True        False         16m     Cluster version is 4.14.11
            
            $ oc debug node/ip-10-0-105-127.us-west-1.compute.internal
            $ chroot /host
            $ poweroff
            
            $ oc get pods -w -o wide
            NAME                          READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
            deployment1-fbfdf5cbc-9dz4n   1/1     Running   0          14s   10.128.2.15   ip-10-0-44-32.us-west-1.compute.internal     <none>           <none>
            deployment1-fbfdf5cbc-dxw25   1/1     Running   0          14s   10.131.0.22   ip-10-0-105-127.us-west-1.compute.internal   <none>           <none>deployment1-fbfdf5cbc-dxw25   1/1     Running   0          103s   10.131.0.22   ip-10-0-105-127.us-west-1.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-dxw25   1/1     Failed    0          5m44s   10.131.0.22   ip-10-0-105-127.us-west-1.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-dxw25   1/1     Terminating   0          5m44s   10.131.0.22   ip-10-0-105-127.us-west-1.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-dxw25   1/1     Terminating   0          5m44s   10.131.0.22   ip-10-0-105-127.us-west-1.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-s487f   0/1     Pending       0          0s      <none>        <none>                                       <none>           <none>
            deployment1-fbfdf5cbc-s487f   0/1     Pending       0          0s      <none>        ip-10-0-47-229.us-west-1.compute.internal    <none>           <none>
            deployment1-fbfdf5cbc-s487f   0/1     Pending       0          0s      <none>        ip-10-0-47-229.us-west-1.compute.internal    <none>           <none>
            deployment1-fbfdf5cbc-s487f   0/1     ContainerCreating   0          0s      <none>        ip-10-0-47-229.us-west-1.compute.internal    <none>           <none>
            deployment1-fbfdf5cbc-s487f   0/1     ContainerCreating   0          55s     <none>        ip-10-0-47-229.us-west-1.compute.internal    <none>           <none>
            deployment1-fbfdf5cbc-s487f   1/1     Running             0          60s     10.129.2.15   ip-10-0-47-229.us-west-1.compute.internal    <none>           <none> 
            
            $ oc get overlappingrangeipreservations.whereabouts.cni.cncf.io -n openshift-multus
            NAME           AGE
            172.17.20.11   6m43s
            172.17.20.12   5s
            
            $ oc rsh deployment1-fbfdf5cbc-s487f ip -o -4 a
            1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
            2: eth0    inet 10.129.2.15/23 brd 10.129.3.255 scope global eth0\       valid_lft forever preferred_lft forever
            3: net1    inet 172.17.20.12/24 brd 172.17.20.255 scope global net1\       valid_lft forever preferred_lft forever

            Ungraceful power off:

            $ oc debug node/ip-10-0-83-161.us-west-1.compute.internal
            $ chroot /host
            $ echo o > /proc/sysrq-trigger
            
            $ oc get pods -w -o wide
            NAME                          READY   STATUS    RESTARTS   AGE    IP            NODE                                        NOMINATED NODE   READINESS GATES
            deployment1-fbfdf5cbc-2wzb7   1/1     Running   0          4m1s   10.130.2.9    ip-10-0-83-161.us-west-1.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-9dz4n   1/1     Running   0          20m    10.128.2.15   ip-10-0-44-32.us-west-1.compute.internal    <none>           <none>
            deployment1-fbfdf5cbc-2wzb7   1/1     Running   0          4m26s   10.130.2.9    ip-10-0-83-161.us-west-1.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-2wzb7   1/1     Failed    0          5m22s   10.130.2.9    ip-10-0-83-161.us-west-1.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-2wzb7   1/1     Terminating   0          5m22s   10.130.2.9    ip-10-0-83-161.us-west-1.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-2wzb7   1/1     Terminating   0          5m22s   10.130.2.9    ip-10-0-83-161.us-west-1.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-mf9rw   0/1     Pending       0          0s      <none>        <none>                                      <none>           <none>
            deployment1-fbfdf5cbc-mf9rw   0/1     Pending       0          0s      <none>        ip-10-0-27-193.us-west-1.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-mf9rw   0/1     Pending       0          0s      <none>        ip-10-0-27-193.us-west-1.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-mf9rw   0/1     ContainerCreating   0          0s      <none>        ip-10-0-27-193.us-west-1.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-mf9rw   0/1     ContainerCreating   0          46s     <none>        ip-10-0-27-193.us-west-1.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-mf9rw   1/1     Running             0          51s     10.131.2.13   ip-10-0-27-193.us-west-1.compute.internal   <none>           <none>
            
            $ oc get overlappingrangeipreservations.whereabouts.cni.cncf.io -n openshift-multus -w
            NAME           AGE
            172.17.20.11   22m
            172.17.20.12   5m57s
            172.17.20.12   5m59s
            172.17.20.12   0s
            
            $ oc rsh  deployment1-fbfdf5cbc-mf9rw  ip -o -4 a
            1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
            2: eth0    inet 10.131.2.13/23 brd 10.131.3.255 scope global eth0\       valid_lft forever preferred_lft forever
            3: net1    inet 172.17.20.12/24 brd 172.17.20.255 scope global net1\       valid_lft forever preferred_lft forever

            Carlos Goncalves added a comment - Deployed 4.14.11 (previously deployed nightly 4.14). Still cannot reproduce the issue on graceful or ungraceful power off. Graceful power off: $ oc get clusterversion NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS version   4.14.11   True        False         16m     Cluster version is 4.14.11 $ oc debug node/ip-10-0-105-127.us-west-1.compute.internal $ chroot /host $ poweroff $ oc get pods -w -o wide NAME                          READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES deployment1-fbfdf5cbc-9dz4n   1/1     Running   0          14s   10.128.2.15   ip-10-0-44-32.us-west-1.compute.internal     <none>           <none> deployment1-fbfdf5cbc-dxw25   1/1     Running   0          14s   10.131.0.22   ip-10-0-105-127.us-west-1.compute.internal   <none>           <none>deployment1-fbfdf5cbc-dxw25   1/1     Running   0          103s   10.131.0.22   ip-10-0-105-127.us-west-1.compute.internal   <none>           <none> deployment1-fbfdf5cbc-dxw25   1/1     Failed    0          5m44s   10.131.0.22   ip-10-0-105-127.us-west-1.compute.internal   <none>           <none> deployment1-fbfdf5cbc-dxw25   1/1     Terminating   0          5m44s   10.131.0.22   ip-10-0-105-127.us-west-1.compute.internal   <none>           <none> deployment1-fbfdf5cbc-dxw25   1/1     Terminating   0          5m44s   10.131.0.22   ip-10-0-105-127.us-west-1.compute.internal   <none>           <none> deployment1-fbfdf5cbc-s487f   0/1     Pending       0          0s      <none>        <none>                                       <none>           <none> deployment1-fbfdf5cbc-s487f   0/1     Pending       0          0s      <none>        ip-10-0-47-229.us-west-1.compute.internal    <none>           <none> deployment1-fbfdf5cbc-s487f   0/1     Pending       0          0s      <none>        ip-10-0-47-229.us-west-1.compute.internal    <none>           <none> deployment1-fbfdf5cbc-s487f   0/1     ContainerCreating   0          0s      <none>        ip-10-0-47-229.us-west-1.compute.internal    <none>           <none> deployment1-fbfdf5cbc-s487f   0/1     ContainerCreating   0          55s     <none>        ip-10-0-47-229.us-west-1.compute.internal    <none>           <none> deployment1-fbfdf5cbc-s487f   1/1     Running             0          60s     10.129.2.15   ip-10-0-47-229.us-west-1.compute.internal    <none>           <none> $ oc get overlappingrangeipreservations.whereabouts.cni.cncf.io -n openshift-multus NAME           AGE 172.17.20.11   6m43s 172.17.20.12   5s $ oc rsh deployment1-fbfdf5cbc-s487f ip -o -4 a 1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever 2: eth0    inet 10.129.2.15/23 brd 10.129.3.255 scope global eth0\       valid_lft forever preferred_lft forever 3: net1    inet 172.17.20.12/24 brd 172.17.20.255 scope global net1\       valid_lft forever preferred_lft forever Ungraceful power off: $ oc debug node/ip-10-0-83-161.us-west-1.compute.internal $ chroot /host $ echo o > /proc/sysrq-trigger $ oc get pods -w -o wide NAME                          READY   STATUS    RESTARTS   AGE    IP            NODE                                        NOMINATED NODE   READINESS GATES deployment1-fbfdf5cbc-2wzb7   1/1     Running   0          4m1s   10.130.2.9    ip-10-0-83-161.us-west-1.compute.internal   <none>           <none> deployment1-fbfdf5cbc-9dz4n   1/1     Running   0          20m    10.128.2.15   ip-10-0-44-32.us-west-1.compute.internal    <none>           <none> deployment1-fbfdf5cbc-2wzb7   1/1     Running   0          4m26s   10.130.2.9    ip-10-0-83-161.us-west-1.compute.internal   <none>           <none> deployment1-fbfdf5cbc-2wzb7   1/1     Failed    0          5m22s   10.130.2.9    ip-10-0-83-161.us-west-1.compute.internal   <none>           <none> deployment1-fbfdf5cbc-2wzb7   1/1     Terminating   0          5m22s   10.130.2.9    ip-10-0-83-161.us-west-1.compute.internal   <none>           <none> deployment1-fbfdf5cbc-2wzb7   1/1     Terminating   0          5m22s   10.130.2.9    ip-10-0-83-161.us-west-1.compute.internal   <none>           <none> deployment1-fbfdf5cbc-mf9rw   0/1     Pending       0          0s      <none>        <none>                                      <none>           <none> deployment1-fbfdf5cbc-mf9rw   0/1     Pending       0          0s      <none>        ip-10-0-27-193.us-west-1.compute.internal   <none>           <none> deployment1-fbfdf5cbc-mf9rw   0/1     Pending       0          0s      <none>        ip-10-0-27-193.us-west-1.compute.internal   <none>           <none> deployment1-fbfdf5cbc-mf9rw   0/1     ContainerCreating   0          0s      <none>        ip-10-0-27-193.us-west-1.compute.internal   <none>           <none> deployment1-fbfdf5cbc-mf9rw   0/1     ContainerCreating   0          46s     <none>        ip-10-0-27-193.us-west-1.compute.internal   <none>           <none> deployment1-fbfdf5cbc-mf9rw   1/1     Running             0          51s     10.131.2.13   ip-10-0-27-193.us-west-1.compute.internal   <none>           <none> $ oc get overlappingrangeipreservations.whereabouts.cni.cncf.io -n openshift-multus -w NAME           AGE 172.17.20.11   22m 172.17.20.12   5m57s 172.17.20.12   5m59s 172.17.20.12   0s $ oc rsh  deployment1-fbfdf5cbc-mf9rw  ip -o -4 a 1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever 2: eth0    inet 10.131.2.13/23 brd 10.131.3.255 scope global eth0\       valid_lft forever preferred_lft forever 3: net1    inet 172.17.20.12/24 brd 172.17.20.255 scope global net1\       valid_lft forever preferred_lft forever

            Carlos Goncalves added a comment - - edited

            I followed the exact reproducer on the latest 4.14. I could not reproduce this issue.

            $ oc get pod -o wide
            NAME                          READY   STATUS    RESTARTS   AGE   IP            NODE                                        NOMINATED NODE   READINESS GATES
            deployment1-fbfdf5cbc-8fw9g   1/1     Running   0          9s    10.129.2.21   ip-10-0-34-156.us-east-2.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-qf8jw   1/1     Running   0          9s    10.131.0.17   ip-10-0-108-60.us-east-2.compute.internal   <none>           <none>
            
            $ oc get overlappingrangeipreservations.whereabouts.cni.cncf.io -n openshift-multus
            NAME           AGE
            172.17.20.11   45s
            172.17.20.12   44s
            
            $ oc debug node/ip-10-0-34-156.us-east-2.compute.internal
            Temporary namespace openshift-debug-f246j is created for debugging node...
            Starting pod/ip-10-0-34-156us-east-2computeinternal-debug-xmvhf ...
            To use host binaries, run `chroot /host`
            Pod IP: 10.0.34.156
            If you don't see a command prompt, try pressing enter.
            sh-4.4# chroot /host
            sh-5.1# shutdown
            Shutdown scheduled for Tue 2024-02-27 14:51:42 UTC, use 'shutdown -c' to cancel.
            
            $ oc get node 
            NAME                                         STATUS     ROLES                  AGE   VERSION
            ip-10-0-108-60.us-east-2.compute.internal    Ready      worker                 44m   v1.27.10+c79e5e2
            ip-10-0-112-198.us-east-2.compute.internal   Ready      worker                 44m   v1.27.10+c79e5e2
            ip-10-0-122-57.us-east-2.compute.internal    Ready      control-plane,master   50m   v1.27.10+c79e5e2
            ip-10-0-123-176.us-east-2.compute.internal   Ready      control-plane,master   50m   v1.27.10+c79e5e2
            ip-10-0-34-156.us-east-2.compute.internal    NotReady   worker                 41m   v1.27.10+c79e5e2
            ip-10-0-54-145.us-east-2.compute.internal    Ready      control-plane,master   51m   v1.27.10+c79e5e2
            
            $ oc get deployment
            NAME          READY   UP-TO-DATE   AVAILABLE   AGE
            deployment1   1/2     2            1           6m41s
            
            $ oc get pod -o wide -w
            NAME                          READY   STATUS    RESTARTS   AGE    IP            NODE                                        NOMINATED NODE   READINESS GATES
            deployment1-fbfdf5cbc-8fw9g   1/1     Running   0          2m1s   10.129.2.21   ip-10-0-34-156.us-east-2.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-qf8jw   1/1     Running   0          2m1s   10.131.0.17   ip-10-0-108-60.us-east-2.compute.internal   <none>           <none>
                 deployment1-fbfdf5cbc-8fw9g   1/1     Running   0          4m51s   10.129.2.21   ip-10-0-34-156.us-east-2.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-8fw9g   1/1     Failed    0          8m16s   10.129.2.21   ip-10-0-34-156.us-east-2.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-8fw9g   1/1     Terminating   0          8m16s   10.129.2.21   ip-10-0-34-156.us-east-2.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-8fw9g   1/1     Terminating   0          8m16s   10.129.2.21   ip-10-0-34-156.us-east-2.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-58vdv   0/1     Pending       0          0s      <none>        <none>                                      <none>           <none>
            deployment1-fbfdf5cbc-58vdv   0/1     Pending       0          0s      <none>        ip-10-0-112-198.us-east-2.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-58vdv   0/1     Pending       0          0s      <none>        ip-10-0-112-198.us-east-2.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-58vdv   0/1     ContainerCreating   0          0s      <none>        ip-10-0-112-198.us-east-2.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-58vdv   0/1     ContainerCreating   0          27s     <none>        ip-10-0-112-198.us-east-2.compute.internal   <none>           <none>
            deployment1-fbfdf5cbc-58vdv   1/1     Running             0          32s     10.128.2.16   ip-10-0-112-198.us-east-2.compute.internal   <none>           <none>
            
            $ oc -n openshift-multus logs ds/whereabouts-reconciler
            2024-02-27T14:56:00Z [debug] NewReconcileLooper - inferred connection data
            2024-02-27T14:56:00Z [debug] listing IP pools
            2024-02-27T14:56:00Z [debug] Added IP 172.17.20.12 for pod nadtesting/deployment1-fbfdf5cbc-qf8jw
            2024-02-27T14:56:00Z [debug] the IP reservation: IP: 172.17.20.11 is reserved for pod: nadtesting/deployment1-fbfdf5cbc-8fw9g
            2024-02-27T14:56:00Z [debug] pod ref nadtesting/deployment1-fbfdf5cbc-8fw9g is not listed in the live pods list
            2024-02-27T14:56:00Z [debug] the IP reservation: IP: 172.17.20.12 is reserved for pod: nadtesting/deployment1-fbfdf5cbc-qf8jw
            2024-02-27T14:56:00Z [debug] pod reference nadtesting/deployment1-fbfdf5cbc-qf8jw matches allocation; Allocation IP: 172.17.20.12; PodIPs: map[172.17.20.12:{}]
            2024-02-27T14:56:00Z [debug] pod ref nadtesting/deployment1-fbfdf5cbc-8fw9g is not listed in the live pods list
            2024-02-27T14:56:00Z [debug] pod reference nadtesting/deployment1-fbfdf5cbc-qf8jw matches allocation; Allocation IP: 172.17.20.12; PodIPs: map[172.17.20.12:{}]
            2024-02-27T14:56:00Z [debug] Going to update the reserve list to: [IP: 172.17.20.12 is reserved for pod: nadtesting/deployment1-fbfdf5cbc-qf8jw]
            2024-02-27T14:56:00Z [debug] successfully cleanup IPs: [172.17.20.11]
            2024-02-27T14:56:00Z [verbose] removed stale overlappingIP allocation [172.17.20.11]
            2024-02-27T14:56:00Z [verbose] reconciler success
            
            $ oc get deployment
            NAME          READY   UP-TO-DATE   AVAILABLE   AGE
            deployment1   2/2     2            2           13m
            
            $ oc get overlappingrangeipreservations.whereabouts.cni.cncf.io -n openshift-multus
            NAME           AGE
            172.17.20.11   4m23s
            172.17.20.12   13m
            
            $ oc rsh deployment1-fbfdf5cbc-58vdv ip -o -4 a
            1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
            2: eth0    inet 10.128.2.16/23 brd 10.128.3.255 scope global eth0\       valid_lft forever preferred_lft forever
            3: net1    inet 172.17.20.11/24 brd 172.17.20.255 scope global net1\       valid_lft forever preferred_lft forever

            The old Pod was gracefully terminated and a new Pod created and assigned with the expected IP address on net1 interface.

            Carlos Goncalves added a comment - - edited I followed the exact reproducer on the latest 4.14. I could not reproduce this issue. $ oc get pod -o wide NAME                          READY   STATUS    RESTARTS   AGE   IP            NODE                                        NOMINATED NODE   READINESS GATES deployment1-fbfdf5cbc-8fw9g   1/1     Running   0          9s    10.129.2.21   ip-10-0-34-156.us-east-2.compute.internal   <none>           <none> deployment1-fbfdf5cbc-qf8jw   1/1     Running   0          9s    10.131.0.17   ip-10-0-108-60.us-east-2.compute.internal   <none>           <none> $ oc get overlappingrangeipreservations.whereabouts.cni.cncf.io -n openshift-multus NAME           AGE 172.17.20.11   45s 172.17.20.12   44s $ oc debug node/ip-10-0-34-156.us-east-2.compute.internal Temporary namespace openshift-debug-f246j is created for debugging node... Starting pod/ip-10-0-34-156us-east-2computeinternal-debug-xmvhf ... To use host binaries, run `chroot /host` Pod IP: 10.0.34.156 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-5.1# shutdown Shutdown scheduled for Tue 2024-02-27 14:51:42 UTC, use 'shutdown -c' to cancel. $ oc get node  NAME                                         STATUS     ROLES                  AGE   VERSION ip-10-0-108-60.us-east-2.compute.internal    Ready      worker                 44m   v1.27.10+c79e5e2 ip-10-0-112-198.us-east-2.compute.internal   Ready      worker                 44m   v1.27.10+c79e5e2 ip-10-0-122-57.us-east-2.compute.internal    Ready      control-plane,master   50m   v1.27.10+c79e5e2 ip-10-0-123-176.us-east-2.compute.internal   Ready      control-plane,master   50m   v1.27.10+c79e5e2 ip-10-0-34-156.us-east-2.compute.internal    NotReady   worker                 41m   v1.27.10+c79e5e2 ip-10-0-54-145.us-east-2.compute.internal    Ready      control-plane,master   51m   v1.27.10+c79e5e2 $ oc get deployment NAME          READY   UP-TO-DATE   AVAILABLE   AGE deployment1   1/2     2            1           6m41s $ oc get pod -o wide -w NAME                          READY   STATUS    RESTARTS   AGE    IP            NODE                                        NOMINATED NODE   READINESS GATES deployment1-fbfdf5cbc-8fw9g   1/1     Running   0          2m1s   10.129.2.21   ip-10-0-34-156.us-east-2.compute.internal   <none>           <none> deployment1-fbfdf5cbc-qf8jw   1/1     Running   0          2m1s   10.131.0.17   ip-10-0-108-60.us-east-2.compute.internal   <none>           <none>      deployment1-fbfdf5cbc-8fw9g   1/1     Running   0          4m51s   10.129.2.21   ip-10-0-34-156.us-east-2.compute.internal   <none>           <none> deployment1-fbfdf5cbc-8fw9g   1/1     Failed    0          8m16s   10.129.2.21   ip-10-0-34-156.us-east-2.compute.internal   <none>           <none> deployment1-fbfdf5cbc-8fw9g   1/1     Terminating   0          8m16s   10.129.2.21   ip-10-0-34-156.us-east-2.compute.internal   <none>           <none> deployment1-fbfdf5cbc-8fw9g   1/1     Terminating   0          8m16s   10.129.2.21   ip-10-0-34-156.us-east-2.compute.internal   <none>           <none> deployment1-fbfdf5cbc-58vdv   0/1     Pending       0          0s      <none>        <none>                                      <none>           <none> deployment1-fbfdf5cbc-58vdv   0/1     Pending       0          0s      <none>        ip-10-0-112-198.us-east-2.compute.internal   <none>           <none> deployment1-fbfdf5cbc-58vdv   0/1     Pending       0          0s      <none>        ip-10-0-112-198.us-east-2.compute.internal   <none>           <none> deployment1-fbfdf5cbc-58vdv   0/1     ContainerCreating   0          0s      <none>        ip-10-0-112-198.us-east-2.compute.internal   <none>           <none> deployment1-fbfdf5cbc-58vdv   0/1     ContainerCreating   0          27s     <none>        ip-10-0-112-198.us-east-2.compute.internal   <none>           <none> deployment1-fbfdf5cbc-58vdv   1/1     Running             0          32s     10.128.2.16   ip-10-0-112-198.us-east-2.compute.internal   <none>           <none> $ oc -n openshift-multus logs ds/whereabouts-reconciler 2024-02-27T14:56:00Z [debug] NewReconcileLooper - inferred connection data 2024-02-27T14:56:00Z [debug] listing IP pools 2024-02-27T14:56:00Z [debug] Added IP 172.17.20.12 for pod nadtesting/deployment1-fbfdf5cbc-qf8jw 2024-02-27T14:56:00Z [debug] the IP reservation: IP: 172.17.20.11 is reserved for pod: nadtesting/deployment1-fbfdf5cbc-8fw9g 2024-02-27T14:56:00Z [debug] pod ref nadtesting/deployment1-fbfdf5cbc-8fw9g is not listed in the live pods list 2024-02-27T14:56:00Z [debug] the IP reservation: IP: 172.17.20.12 is reserved for pod: nadtesting/deployment1-fbfdf5cbc-qf8jw 2024-02-27T14:56:00Z [debug] pod reference nadtesting/deployment1-fbfdf5cbc-qf8jw matches allocation; Allocation IP: 172.17.20.12; PodIPs: map[172.17.20.12:{}] 2024-02-27T14:56:00Z [debug] pod ref nadtesting/deployment1-fbfdf5cbc-8fw9g is not listed in the live pods list 2024-02-27T14:56:00Z [debug] pod reference nadtesting/deployment1-fbfdf5cbc-qf8jw matches allocation; Allocation IP: 172.17.20.12; PodIPs: map[172.17.20.12:{}] 2024-02-27T14:56:00Z [debug] Going to update the reserve list to: [IP: 172.17.20.12 is reserved for pod: nadtesting/deployment1-fbfdf5cbc-qf8jw] 2024-02-27T14:56:00Z [debug] successfully cleanup IPs: [172.17.20.11] 2024-02-27T14:56:00Z [verbose] removed stale overlappingIP allocation [172.17.20.11] 2024-02-27T14:56:00Z [verbose] reconciler success $ oc get deployment NAME          READY   UP-TO-DATE   AVAILABLE   AGE deployment1   2/2     2            2           13m $ oc get overlappingrangeipreservations.whereabouts.cni.cncf.io -n openshift-multus NAME           AGE 172.17.20.11   4m23s 172.17.20.12   13m $ oc rsh deployment1-fbfdf5cbc-58vdv ip -o -4 a 1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever 2: eth0    inet 10.128.2.16/23 brd 10.128.3.255 scope global eth0\       valid_lft forever preferred_lft forever 3: net1    inet 172.17.20.11/24 brd 172.17.20.255 scope global net1\       valid_lft forever preferred_lft forever The old Pod was gracefully terminated and a new Pod created and assigned with the expected IP address on net1 interface.

            Peng Liu added a comment -

            rhn-support-snalawad The behavior you describe is expected. In K8S, the pod won't be deleted until the node is available. And the resource allocated to the pod (the IP address) cannot be released until the pod is deleted. If you don't want to wait for the node to come back. You can force delete the stale pod. Then whereabouts reconciler would be able to revoke the allocated IP.

            Peng Liu added a comment - rhn-support-snalawad The behavior you describe is expected. In K8S, the pod won't be deleted until the node is available. And the resource allocated to the pod (the IP address) cannot be released until the pod is deleted. If you don't want to wait for the node to come back. You can force delete the stale pod. Then whereabouts reconciler would be able to revoke the allocated IP.

              rh-ee-marguerr Marcelo Guerrero Viveros
              rhn-support-klakhwar Ketan Lakhwara
              Weibin Liang Weibin Liang
              Carlos Goncalves
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated:
                Resolved: