Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-28553

When Deleting a pod with NAD getting error failed to garbage collect addresses

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • None
    • 4.12.z
    • Networking / multus
    • None
    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      we have noticed that when deleting a pod which consumes additional networks using NetworkAttachmentDefinitions, the following error is created and the pod is deleted anyway: 
      48m         Warning   IPAddressGarbageCollectionFailed   pod/helloworld-74bc99864b-98x2f    failed to garbage collect addresses for pod bug-address-garbage-collection/helloworld-74bc99864b-98x2f
      
      After looking in the whereabouts-reconciler pods, we can also see errors showing that the reconciler is unable to clean up the addresses. 
      
          

      Version-Release number of selected component (if applicable):

      4.12.x
      
          

      How reproducible:Everytime

      We can reproduce this issue with below steps:

      1. Create Net-Attach-Def with 5 IPs in the range

      2. Whereabouts-reconciler pods should be available in openshift-multus ns.

      3. Create a Deployment with 2 replicas using the same net-attach-def

      4. Restart one of the pods and check the whereabouts-reconciler pod logs on the same node.

      5. You will get the below error message in pods

      6. Though it will not create an issue these errors are misleading.

      ~~~
      [quickcluster@upi-0 nadtesting]$ cat nad.yaml
      apiVersion: "k8s.cni.cncf.io/v1"
      kind: NetworkAttachmentDefinition
      metadata:
      name: macvlan-net-attach1
      spec:
      config: '{
      "cniVersion": "0.3.1",
      "type": "macvlan",
      "master": "br-ex",
      "mode": "bridge",
      "ipam":

      { "type": "whereabouts", "datastore": "kubernetes", "range": "172.17.20.0/24", "range_start": "172.17.20.11", "range_end": "172.17.20.15" }

      }'
      ~~~

      ~~~
      [quickcluster@upi-0 nadtesting]$ oc get pods -n nadtesting -o wide
      NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
      deployment1-0 1/1 Running 0 79s 10.128.2.96 worker-2.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      deployment1-1 1/1 Running 0 78s 10.129.2.10 worker-0.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      [quickcluster@upi-0 nadtesting]$ oc delete pod deployment1-1 -n nadtesting
      pod "deployment1-1" deleted
      ~~~

      ~~~
      [quickcluster@upi-0 nadtesting]$ oc get pods -o wide
      NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
      multus-5phgg 1/1 Running 5 (20h ago) 6d3h 10.74.210.135 master-1.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      multus-5st4p 1/1 Running 1 6d3h 10.74.208.133 worker-1.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      multus-97mfn 1/1 Running 16 (20h ago) 6d3h 10.74.212.72 master-2.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      multus-additional-cni-plugins-4svgp 1/1 Running 1 6d3h 10.74.212.72 master-2.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      multus-additional-cni-plugins-krsnn 1/1 Running 1 6d3h 10.74.208.133 worker-1.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      multus-additional-cni-plugins-qww45 1/1 Running 2 6d3h 10.74.209.119 worker-0.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      multus-additional-cni-plugins-rlsgw 1/1 Running 1 6d3h 10.74.212.93 worker-2.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      multus-additional-cni-plugins-s8z72 1/1 Running 1 6d3h 10.74.210.230 master-0.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      multus-additional-cni-plugins-wfdwt 1/1 Running 1 6d3h 10.74.210.135 master-1.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      multus-admission-controller-c7c5656f6-6xgrv 2/2 Running 0 20h 10.130.0.56 master-2.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      multus-admission-controller-c7c5656f6-89nnh 2/2 Running 0 20h 10.130.0.55 master-2.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      multus-ffvj9 1/1 Running 1 6d3h 10.74.212.93 worker-2.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      multus-nf2j8 1/1 Running 1 6d3h 10.74.210.230 master-0.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      multus-tt2p8 1/1 Running 2 6d3h 10.74.209.119 worker-0.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      network-metrics-daemon-622bg 2/2 Running 2 6d3h 10.128.2.3 worker-2.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      network-metrics-daemon-68wqh 2/2 Running 4 6d3h 10.129.2.4 worker-0.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      network-metrics-daemon-bbh82 2/2 Running 2 6d3h 10.131.0.4 worker-1.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      network-metrics-daemon-kn5dd 2/2 Running 2 6d3h 10.129.0.4 master-1.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      network-metrics-daemon-rj9x5 2/2 Running 2 6d3h 10.130.0.3 master-2.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      network-metrics-daemon-sl7lk 2/2 Running 2 6d3h 10.128.0.4 master-0.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      whereabouts-reconciler-5mg5z 1/1 Running 0 2m49s 10.74.208.133 worker-1.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      whereabouts-reconciler-8kfdr 1/1 Running 0 2m49s 10.74.210.135 master-1.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      whereabouts-reconciler-gcrpl 1/1 Running 0 2m49s 10.74.210.230 master-0.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      whereabouts-reconciler-hlhpc 1/1 Running 0 2m49s 10.74.212.93 worker-2.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      whereabouts-reconciler-j9pkk 1/1 Running 0 2m49s 10.74.212.72 master-2.ketanl.lab.psi.pnq2.redhat.com <none> <none>
      *whereabouts-reconciler-xsrbw 1/1 Running 0 2m49s 10.74.209.119 worker-0.ketanl.lab.psi.pnq2.redhat.com <none> <none>*
      ~~~
      ~~~
      [quickcluster@upi-0 nadtesting]$ oc describe ippools.whereabouts.cni.cncf.io 172.17.20.0-24
      Name: 172.17.20.0-24
      Namespace: openshift-multus
      Labels: <none>
      Annotations: <none>
      API Version: whereabouts.cni.cncf.io/v1alpha1
      Kind: IPPool
      Metadata:
      Creation Timestamp: 2024-01-29T10:04:14Z
      Generation: 3
      Resource Version: 2851669
      UID: 02da6e7d-683b-4ffb-8984-22ac5fb622e2
      Spec:
      Allocations:
      11:
      Id: 3786105b18879187206480002bcc09ab54c355b99046a21463e2e951b900f837
      Podref: nadtesting/deployment1-0
      12:
      Id: 96f5e5e6a35d992011f3f5fcad225e6e18051eceee1636ee37e0390552c56194
      Podref: nadtesting/deployment1-1
      Range: 172.17.20.0/24
      Events: <none>
      ~~~

      ~~~
      [quickcluster@upi-0 nadtesting]$ oc logs whereabouts-reconciler-xsrbw
      2024-01-29T10:03:03Z [debug] Filtering pods with filter key 'spec.nodeName' and filter value 'worker-0.ketanl.lab.psi.pnq2.redhat.com'
      2024-01-29T10:03:03Z [verbose] pod controller created
      2024-01-29T10:03:03Z [verbose] Starting informer factories ...
      2024-01-29T10:03:03Z [verbose] Informer factories started
      2024-01-29T10:03:03Z [verbose] starting network controller
      2024-01-29T10:06:08Z [verbose] deleted pod [nadtesting/deployment1-1]
      2024-01-29T10:06:08Z [verbose] skipped net-attach-def for default network
      2024-01-29T10:06:08Z [debug] pod's network status: {Name:nadtesting/macvlan-net-attach1 Interface:net1 IPs:[172.17.20.12] Mac:26:c4:95:37:a4:d8 Default:false DNS:

      {Nameservers:[] Domain: Search:[] Options:[]}

      DeviceInfo:<nil>}
      2024-01-29T10:06:08Z [verbose] the NAD's config: {{ "cniVersion": "0.3.1", "type": "macvlan", "master": "br-ex", "mode": "bridge", "ipam":

      { "type": "whereabouts", "datastore": "kubernetes", "range": "172.17.20.0/24", "range_start": "172.17.20.11", "range_end": "172.17.20.15" }

      }}
      2024-01-29T10:06:08Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
      2024-01-29T10:06:08Z [verbose] result of garbage collecting pods: failed to get the IPPool data: ippool.whereabouts.cni.cncf.io "172.17.20.0-24" not found
      2024-01-29T10:06:08Z [verbose] re-queuing IP address reconciliation request for pod nadtesting/deployment1-1; retry #: 0
      2024-01-29T10:06:08Z [verbose] skipped net-attach-def for default network
      2024-01-29T10:06:08Z [debug] pod's network status: {Name:nadtesting/macvlan-net-attach1 Interface:net1 IPs:[172.17.20.12] Mac:26:c4:95:37:a4:d8 Default:false DNS:

      {Nameservers:[] Domain: Search:[] Options:[]}

      DeviceInfo:<nil>}
      2024-01-29T10:06:08Z [verbose] the NAD's config: {{ "cniVersion": "0.3.1", "type": "macvlan", "master": "br-ex", "mode": "bridge", "ipam":

      { "type": "whereabouts", "datastore": "kubernetes", "range": "172.17.20.0/24", "range_start": "172.17.20.11", "range_end": "172.17.20.15" }

      }}
      2024-01-29T10:06:08Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
      2024-01-29T10:06:08Z [verbose] result of garbage collecting pods: failed to get the IPPool data: ippool.whereabouts.cni.cncf.io "172.17.20.0-24" not found
      2024-01-29T10:06:08Z [verbose] re-queuing IP address reconciliation request for pod nadtesting/deployment1-1; retry #: 1
      2024-01-29T10:06:08Z [verbose] skipped net-attach-def for default network
      2024-01-29T10:06:08Z [debug] pod's network status: {Name:nadtesting/macvlan-net-attach1 Interface:net1 IPs:[172.17.20.12] Mac:26:c4:95:37:a4:d8 Default:false DNS:

      {Nameservers:[] Domain: Search:[] Options:[]}

      DeviceInfo:<nil>}
      2024-01-29T10:06:08Z [verbose] the NAD's config: {{ "cniVersion": "0.3.1", "type": "macvlan", "master": "br-ex", "mode": "bridge", "ipam":

      { "type": "whereabouts", "datastore": "kubernetes", "range": "172.17.20.0/24", "range_start": "172.17.20.11", "range_end": "172.17.20.15" }

      }}
      2024-01-29T10:06:08Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
      2024-01-29T10:06:08Z [verbose] result of garbage collecting pods: failed to get the IPPool data: ippool.whereabouts.cni.cncf.io "172.17.20.0-24" not found
      2024-01-29T10:06:08Z [verbose] re-queuing IP address reconciliation request for pod nadtesting/deployment1-1; retry #: 2
      2024-01-29T10:06:08Z [verbose] skipped net-attach-def for default network
      2024-01-29T10:06:08Z [debug] pod's network status: {Name:nadtesting/macvlan-net-attach1 Interface:net1 IPs:[172.17.20.12] Mac:26:c4:95:37:a4:d8 Default:false DNS:

      {Nameservers:[] Domain: Search:[] Options:[]}

      DeviceInfo:<nil>}
      2024-01-29T10:06:08Z [verbose] the NAD's config: {{ "cniVersion": "0.3.1", "type": "macvlan", "master": "br-ex", "mode": "bridge", "ipam":

      { "type": "whereabouts", "datastore": "kubernetes", "range": "172.17.20.0/24", "range_start": "172.17.20.11", "range_end": "172.17.20.15" }

      }}
      2024-01-29T10:06:08Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
      2024-01-29T10:06:08Z [verbose] result of garbage collecting pods: failed to get the IPPool data: ippool.whereabouts.cni.cncf.io "172.17.20.0-24" not found
      2024-01-29T10:06:08Z [error] dropping pod [nadtesting/deployment1-1] deletion out of the queue - could not reconcile IP: failed to get the IPPool data: ippool.whereabouts.cni.cncf.io "172.17.20.0-24" not found
      2024-01-29T10:06:08Z [verbose] Event(v1.ObjectReference

      {Kind:"Pod", Namespace:"nadtesting", Name:"deployment1-1", UID:"2b23f2c4-89fe-4b0c-abf1-c866ac7e56b4", APIVersion:"v1", ResourceVersion:"2852408", FieldPath:""}

      ): type: 'Warning' reason: 'IPAddressGarbageCollectionFailed' failed to garbage collect addresses for pod nadtesting/deployment1-1
      ~~~

          Expected results:{code:none}
      
          

      Additional info:

      
          

            zshi@redhat.com Zenghui Shi
            hepatil Hemant Patil
            Weibin Liang Weibin Liang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: