Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18584

Pod sometimes doesn’t work as expected when it has the same name with previous pods on OVN network cluster

    XMLWordPrintable

Details

    Description

      Description of problem:

      Pod sometimes doesn’t work as expected when it has the same name with previous pods on OVN network cluster

      Version-Release number of selected component (if applicable):

      4.12.0-0.nightly-2023-09-05-064152

      How reproducible:

      Always, but need try more times

      Steps to Reproduce:

      1.Create a machineset
      liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml 
      machineset.machine.openshift.io/huliu-nu96a-zn7mc-workera created
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                              PHASE     TYPE   REGION    ZONE              AGE
      huliu-nu96a-zn7mc-master-0        Running   AHV    Unnamed   Development-LTS   6h14m
      huliu-nu96a-zn7mc-master-1        Running   AHV    Unnamed   Development-LTS   6h14m
      huliu-nu96a-zn7mc-master-2        Running   AHV    Unnamed   Development-LTS   6h14m
      huliu-nu96a-zn7mc-worker-5j47v    Running   AHV    Unnamed   Development-LTS   6h9m
      huliu-nu96a-zn7mc-worker-thprs    Running   AHV    Unnamed   Development-LTS   6h9m
      huliu-nu96a-zn7mc-workera-x54mr   Running   AHV    Unnamed   Development-LTS   6m50s
      liuhuali@Lius-MacBook-Pro huali-test % oc get node                                          
      NAME                              STATUS   ROLES                  AGE     VERSION
      huliu-nu96a-zn7mc-master-0        Ready    control-plane,master   6h12m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-master-1        Ready    control-plane,master   6h12m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-master-2        Ready    control-plane,master   6h12m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-worker-5j47v    Ready    worker                 6h      v1.25.12+26bab08
      huliu-nu96a-zn7mc-worker-thprs    Ready    worker                 6h      v1.25.12+26bab08
      huliu-nu96a-zn7mc-workera-x54mr   Ready    worker                 3m7s    v1.25.12+26bab08 
      
      2.Create a pod on the new node
      liuhuali@Lius-MacBook-Pro huali-test % oc create -f kubelet-killer2.yaml
      pod/kubelet-killer created
      liuhuali@Lius-MacBook-Pro huali-test % cat kubelet-killer2.yaml
      apiVersion: v1
      kind: Pod
      metadata:
        labels:
          kubelet-killer: ""
        name: kubelet-killer
        namespace: openshift-machine-api
      spec:
        containers:
        - command:
          - pkill
          - -STOP
          - kubelet
          image: quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c
          imagePullPolicy: Always
          name: kubelet-killer
          securityContext:
            privileged: true
        enableServiceLinks: true
        hostPID: true
        nodeName: huliu-nu96a-zn7mc-workera-x54mr
        restartPolicy: Never
      liuhuali@Lius-MacBook-Pro huali-test % 
      
      3.The pod worked as expected
      liuhuali@Lius-MacBook-Pro huali-test % oc get node   
      NAME                              STATUS     ROLES                  AGE     VERSION
      huliu-nu96a-zn7mc-master-0        Ready      control-plane,master   6h13m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-master-1        Ready      control-plane,master   6h14m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-master-2        Ready      control-plane,master   6h13m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-worker-5j47v    Ready      worker                 6h2m    v1.25.12+26bab08
      huliu-nu96a-zn7mc-worker-thprs    Ready      worker                 6h2m    v1.25.12+26bab08
      huliu-nu96a-zn7mc-workera-x54mr   NotReady   worker                 4m43s   v1.25.12+26bab08
      liuhuali@Lius-MacBook-Pro huali-test % oc describe pod kubelet-killer  
      Name:         kubelet-killer
      Namespace:    openshift-machine-api
      Priority:     0
      Node:         huliu-nu96a-zn7mc-workera-x54mr/10.0.132.101
      Start Time:   Wed, 06 Sep 2023 15:33:43 +0800
      Labels:       kubelet-killer=
      Annotations:  k8s.ovn.org/pod-networks:
                      {"default":{"ip_addresses":["10.130.8.7/23"],"mac_address":"0a:58:0a:82:08:07","gateway_ips":["10.130.8.1"],"ip_address":"10.130.8.7/23","...
                    k8s.v1.cni.cncf.io/network-status:
                      [{
                          "name": "ovn-kubernetes",
                          "interface": "eth0",
                          "ips": [
                              "10.130.8.7"
                          ],
                          "mac": "0a:58:0a:82:08:07",
                          "default": true,
                          "dns": {}
                      }]
                    k8s.v1.cni.cncf.io/networks-status:
                      [{
                          "name": "ovn-kubernetes",
                          "interface": "eth0",
                          "ips": [
                              "10.130.8.7"
                          ],
                          "mac": "0a:58:0a:82:08:07",
                          "default": true,
                          "dns": {}
                      }]
                    openshift.io/scc: privileged
      Status:       Pending
      IP:           
      IPs:          <none>
      Containers:
        kubelet-killer:
          Container ID:  
          Image:         quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c
          Image ID:      
          Port:          <none>
          Host Port:     <none>
          Command:
            pkill
            -STOP
            kubelet
          State:          Waiting
            Reason:       ContainerCreating
          Ready:          False
          Restart Count:  0
          Environment:    <none>
          Mounts:
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nm9vd (ro)
      Conditions:
        Type              Status
        Initialized       True 
        Ready             False 
        ContainersReady   False 
        PodScheduled      True 
      Volumes:
        kube-api-access-nm9vd:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3607
          ConfigMapName:           kube-root-ca.crt
          ConfigMapOptional:       <nil>
          DownwardAPI:             true
          ConfigMapName:           openshift-service-ca.crt
          ConfigMapOptional:       <nil>
      QoS Class:                   BestEffort
      Node-Selectors:              <none>
      Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Events:
        Type    Reason          Age   From     Message
        ----    ------          ----  ----     -------
        Normal  AddedInterface  90s   multus   Add eth0 [10.130.8.7/23] from ovn-kubernetes
        Normal  Pulling         90s   kubelet  Pulling image "quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c"
        Normal  Pulled          87s   kubelet  Successfully pulled image "quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c" in 2.310348601s (2.310355399s including waiting)
        Normal  Created         87s   kubelet  Created container kubelet-killer
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                              PHASE     TYPE   REGION    ZONE              AGE
      huliu-nu96a-zn7mc-master-0        Running   AHV    Unnamed   Development-LTS   6h17m
      huliu-nu96a-zn7mc-master-1        Running   AHV    Unnamed   Development-LTS   6h17m
      huliu-nu96a-zn7mc-master-2        Running   AHV    Unnamed   Development-LTS   6h17m
      huliu-nu96a-zn7mc-worker-5j47v    Running   AHV    Unnamed   Development-LTS   6h11m
      huliu-nu96a-zn7mc-worker-thprs    Running   AHV    Unnamed   Development-LTS   6h11m
      huliu-nu96a-zn7mc-workera-x54mr   Running   AHV    Unnamed   Development-LTS   9m5s
      liuhuali@Lius-MacBook-Pro huali-test % oc get pod
      NAME                                                  READY   STATUS              RESTARTS   AGE
      cluster-autoscaler-operator-854c6755f5-r9c2k          2/2     Running             0          5h41m
      cluster-baremetal-operator-976487bc9-7czpk            2/2     Running             0          5h41m
      control-plane-machine-set-operator-69684bcccd-c6jnf   1/1     Running             0          5h41m
      kubelet-killer                                        0/1     ContainerCreating   0          98s
      machine-api-controllers-7f574b69b5-w5swt              7/7     Running             0          155m
      machine-api-operator-7f46db4fcc-v6w9p                 2/2     Running             0          5h41m
      
      4.Try this once again. Delete the old machine and let it recreate a new one
      
      liuhuali@Lius-MacBook-Pro huali-test % oc delete machine huliu-nu96a-zn7mc-workera-x54mr
      machine.machine.openshift.io "huliu-nu96a-zn7mc-workera-x54mr" deleted
      liuhuali@Lius-MacBook-Pro huali-test % oc get pod
      NAME                                                  READY   STATUS        RESTARTS   AGE
      cluster-autoscaler-operator-854c6755f5-r9c2k          2/2     Running       0          5h42m
      cluster-baremetal-operator-976487bc9-7czpk            2/2     Running       0          5h42m
      control-plane-machine-set-operator-69684bcccd-c6jnf   1/1     Running       0          5h42m
      kubelet-killer                                        0/1     Terminating   0          2m28s
      machine-api-controllers-7f574b69b5-w5swt              7/7     Running       0          156m
      machine-api-operator-7f46db4fcc-v6w9p                 2/2     Running       0          5h42m
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                              PHASE          TYPE   REGION    ZONE              AGE
      huliu-nu96a-zn7mc-master-0        Running        AHV    Unnamed   Development-LTS   6h18m
      huliu-nu96a-zn7mc-master-1        Running        AHV    Unnamed   Development-LTS   6h18m
      huliu-nu96a-zn7mc-master-2        Running        AHV    Unnamed   Development-LTS   6h18m
      huliu-nu96a-zn7mc-worker-5j47v    Running        AHV    Unnamed   Development-LTS   6h12m
      huliu-nu96a-zn7mc-worker-thprs    Running        AHV    Unnamed   Development-LTS   6h12m
      huliu-nu96a-zn7mc-workera-t8dj2   Provisioning                                      27s
      liuhuali@Lius-MacBook-Pro huali-test % oc get pod                                       
      NAME                                                  READY   STATUS    RESTARTS   AGE
      cluster-autoscaler-operator-854c6755f5-r9c2k          2/2     Running   0          5h44m
      cluster-baremetal-operator-976487bc9-7czpk            2/2     Running   0          5h44m
      control-plane-machine-set-operator-69684bcccd-c6jnf   1/1     Running   0          5h44m
      machine-api-controllers-7f574b69b5-w5swt              7/7     Running   0          158m
      machine-api-operator-7f46db4fcc-v6w9p                 2/2     Running   0          5h44m
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine                                        
      NAME                              PHASE     TYPE   REGION    ZONE              AGE
      huliu-nu96a-zn7mc-master-0        Running   AHV    Unnamed   Development-LTS   6h27m
      huliu-nu96a-zn7mc-master-1        Running   AHV    Unnamed   Development-LTS   6h27m
      huliu-nu96a-zn7mc-master-2        Running   AHV    Unnamed   Development-LTS   6h27m
      huliu-nu96a-zn7mc-worker-5j47v    Running   AHV    Unnamed   Development-LTS   6h21m
      huliu-nu96a-zn7mc-worker-thprs    Running   AHV    Unnamed   Development-LTS   6h21m
      huliu-nu96a-zn7mc-workera-t8dj2   Running   AHV    Unnamed   Development-LTS   9m46s
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME                              STATUS   ROLES                  AGE     VERSION
      huliu-nu96a-zn7mc-master-0        Ready    control-plane,master   6h24m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-master-1        Ready    control-plane,master   6h25m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-master-2        Ready    control-plane,master   6h24m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-worker-5j47v    Ready    worker                 6h13m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-worker-thprs    Ready    worker                 6h13m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-workera-t8dj2   Ready    worker                 6m      v1.25.12+26bab08
      
      5.Create a pod with the same name as the previous one (here is kubelet-killer) on the new node
      liuhuali@Lius-MacBook-Pro huali-test % oc create -f kubelet-killer2.yaml
      pod/kubelet-killer created
      liuhuali@Lius-MacBook-Pro huali-test % cat kubelet-killer2.yaml
      apiVersion: v1
      kind: Pod
      metadata:
        labels:
          kubelet-killer: ""
        name: kubelet-killer
        namespace: openshift-machine-api
      spec:
        containers:
        - command:
          - pkill
          - -STOP
          - kubelet
          image: quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c
          imagePullPolicy: Always
          name: kubelet-killer
          securityContext:
            privileged: true
        enableServiceLinks: true
        hostPID: true
        nodeName: huliu-nu96a-zn7mc-workera-t8dj2
        restartPolicy: Never
      
      6.Check the pod doesn’t work as expected.
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                              PHASE     TYPE   REGION    ZONE              AGE
      huliu-nu96a-zn7mc-master-0        Running   AHV    Unnamed   Development-LTS   6h35m
      huliu-nu96a-zn7mc-master-1        Running   AHV    Unnamed   Development-LTS   6h35m
      huliu-nu96a-zn7mc-master-2        Running   AHV    Unnamed   Development-LTS   6h35m
      huliu-nu96a-zn7mc-worker-5j47v    Running   AHV    Unnamed   Development-LTS   6h29m
      huliu-nu96a-zn7mc-worker-thprs    Running   AHV    Unnamed   Development-LTS   6h29m
      huliu-nu96a-zn7mc-workera-t8dj2   Running   AHV    Unnamed   Development-LTS   17m
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME                              STATUS   ROLES                  AGE     VERSION
      huliu-nu96a-zn7mc-master-0        Ready    control-plane,master   6h32m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-master-1        Ready    control-plane,master   6h33m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-master-2        Ready    control-plane,master   6h32m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-worker-5j47v    Ready    worker                 6h21m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-worker-thprs    Ready    worker                 6h21m   v1.25.12+26bab08
      huliu-nu96a-zn7mc-workera-t8dj2   Ready    worker                 14m     v1.25.12+26bab08
      liuhuali@Lius-MacBook-Pro huali-test % oc get pod
      NAME                                                  READY   STATUS              RESTARTS   AGE
      cluster-autoscaler-operator-854c6755f5-r9c2k          2/2     Running             0          6h
      cluster-baremetal-operator-976487bc9-7czpk            2/2     Running             0          6h
      control-plane-machine-set-operator-69684bcccd-c6jnf   1/1     Running             0          6h
      kubelet-killer                                        0/1     ContainerCreating   0          7m18s
      machine-api-controllers-7f574b69b5-w5swt              7/7     Running             0          174m
      machine-api-operator-7f46db4fcc-v6w9p                 2/2     Running             0          6h
      liuhuali@Lius-MacBook-Pro huali-test % oc describe pod kubelet-killer  
      Name:         kubelet-killer
      Namespace:    openshift-machine-api
      Priority:     0
      Node:         huliu-nu96a-zn7mc-workera-t8dj2/10.0.132.67
      Start Time:   Wed, 06 Sep 2023 15:46:29 +0800
      Labels:       kubelet-killer=
      Annotations:  openshift.io/scc: node-exporter
      Status:       Pending
      IP:           
      IPs:          <none>
      Containers:
        kubelet-killer:
          Container ID:  
          Image:         quay.io/openshifttest/base-alpine@sha256:3126e4eed4a3ebd8bf972b2453fa838200988ee07c01b2251e3ea47e4b1f245c
          Image ID:      
          Port:          <none>
          Host Port:     <none>
          Command:
            pkill
            -STOP
            kubelet
          State:          Waiting
            Reason:       ContainerCreating
          Ready:          False
          Restart Count:  0
          Environment:    <none>
          Mounts:
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dcq5h (ro)
      Conditions:
        Type              Status
        Initialized       True 
        Ready             False 
        ContainersReady   False 
        PodScheduled      True 
      Volumes:
        kube-api-access-dcq5h:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3607
          ConfigMapName:           kube-root-ca.crt
          ConfigMapOptional:       <nil>
          DownwardAPI:             true
          ConfigMapName:           openshift-service-ca.crt
          ConfigMapOptional:       <nil>
      QoS Class:                   BestEffort
      Node-Selectors:              <none>
      Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Events:
        Type     Reason                  Age    From          Message
        ----     ------                  ----   ----          -------
        Warning  ErrorAddingLogicalPort  7m30s  controlplane  deleteLogicalPort failed for pod openshift-machine-api_kubelet-killer: cannot delete GR SNAT for pod openshift-machine-api/kubelet-killer: failed create operation for deleting SNAT rule for pod on gateway router GR_huliu-nu96a-zn7mc-workera-x54mr: unable to get NAT entries for router &{UUID: Copp:<nil> Enabled:<nil> ExternalIDs:map[] LoadBalancer:[] LoadBalancerGroup:[] Name:GR_huliu-nu96a-zn7mc-workera-x54mr Nat:[] Options:map[] Policies:[] Ports:[] StaticRoutes:[]}: failed to get router: GR_huliu-nu96a-zn7mc-workera-x54mr, error: object not found
        Warning  FailedCreatePodSandBox  5m29s  kubelet       Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kubelet-killer_openshift-machine-api_84edbe26-680b-4c50-a8a4-71ffb82b8d9c_0(c1671822d85747016e7a619891ff5981b470c268f478a761de485f3ae3a0f2ef): error adding pod openshift-machine-api_kubelet-killer to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-machine-api/kubelet-killer/84edbe26-680b-4c50-a8a4-71ffb82b8d9c:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-machine-api/kubelet-killer c1671822d85747016e7a619891ff5981b470c268f478a761de485f3ae3a0f2ef] [openshift-machine-api/kubelet-killer c1671822d85747016e7a619891ff5981b470c268f478a761de485f3ae3a0f2ef] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
      '
        Warning  FailedCreatePodSandBox  3m17s  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kubelet-killer_openshift-machine-api_84edbe26-680b-4c50-a8a4-71ffb82b8d9c_0(dced805c3e86acbf5a10a8b4efbc02c64ad3c9360e23885c4fe593ca198f43b0): error adding pod openshift-machine-api_kubelet-killer to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-machine-api/kubelet-killer/84edbe26-680b-4c50-a8a4-71ffb82b8d9c:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-machine-api/kubelet-killer dced805c3e86acbf5a10a8b4efbc02c64ad3c9360e23885c4fe593ca198f43b0] [openshift-machine-api/kubelet-killer dced805c3e86acbf5a10a8b4efbc02c64ad3c9360e23885c4fe593ca198f43b0] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
      '
        Warning  FailedCreatePodSandBox  65s  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kubelet-killer_openshift-machine-api_84edbe26-680b-4c50-a8a4-71ffb82b8d9c_0(4bbf45588909933b9c4086274a08b7cddc2e09fe47e740ee14c74523f4f21ef2): error adding pod openshift-machine-api_kubelet-killer to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-machine-api/kubelet-killer/84edbe26-680b-4c50-a8a4-71ffb82b8d9c:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-machine-api/kubelet-killer 4bbf45588909933b9c4086274a08b7cddc2e09fe47e740ee14c74523f4f21ef2] [openshift-machine-api/kubelet-killer 4bbf45588909933b9c4086274a08b7cddc2e09fe47e740ee14c74523f4f21ef2] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
      '
      
      In the Warning Events it shows “GR_huliu-nu96a-zn7mc-workera-x54mr”, but huliu-nu96a-zn7mc-workera-x54mr is the previous node, I created the pod on huliu-nu96a-zn7mc-workera-t8dj2 in Step 5.
      If create the new pod with different name, there is no such issue. 

      Actual results:

      The pod doesn’t worked as expected when it has the same name with previous pods.

      Expected results:

      The pod should worked as expected even it has the same name with previous pods.

      Additional info:

      The same case worked as expected on SDN network cluster.
      
      Discussion in slack https://redhat-internal.slack.com/archives/CH76YSYSC/p1693983428736929

      Attachments

        Issue Links

          Activity

            People

              ffernand@redhat.com Flavio Fernandes (Inactive)
              huliu@redhat.com Huali Liu
              Jean Chen Jean Chen
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: