Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3889

Egress router POD creation is failing while using openshift-sdn network plugin

    XMLWordPrintable

Details

    • Critical
    • SDN Sprint 228
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      This is a clone of issue OCPBUGS-3744. The following is the description of the original issue:

      Description of problem:

      Egress router POD creation on Openshift 4.11 is failing with below error.
      ~~~
      Nov 15 21:51:29 pltocpwn03 hyperkube[3237]: E1115 21:51:29.467436    3237 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"stage-wfe-proxy-ext-qrhjw_stage-wfe-proxy(c965a287-28aa-47b6-9e79-0cc0e209fcf2)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"stage-wfe-proxy-ext-qrhjw_stage-wfe-proxy(c965a287-28aa-47b6-9e79-0cc0e209fcf2)\\\": rpc error: code = Unknown desc = failed to create pod network sandbox k8s_stage-wfe-proxy-ext-qrhjw_stage-wfe-proxy_c965a287-28aa-47b6-9e79-0cc0e209fcf2_0(72bcf9e52b199061d6e651e84b0892efc142601b2442c2d00b92a1ba23208344): error adding pod stage-wfe-proxy_stage-wfe-proxy-ext-qrhjw to CNI network \\\"multus-cni-network\\\": plugin type=\\\"multus\\\" name=\\\"multus-cni-network\\\" failed (add): [stage-wfe-proxy/stage-wfe-proxy-ext-qrhjw/c965a287-28aa-47b6-9e79-0cc0e209fcf2:openshift-sdn]: error adding container to network \\\"openshift-sdn\\\": CNI request failed with status 400: 'could not open netns \\\"/var/run/netns/8c5ca402-3381-4935-baed-ea454161d669\\\": unknown FS magic on \\\"/var/run/netns/8c5ca402-3381-4935-baed-ea454161d669\\\": 1021994\\n'\"" pod="stage-wfe-proxy/stage-wfe-proxy-ext-qrhjw" podUID=c965a287-28aa-47b6-9e79-0cc0e209fcf2
      ~~~
      
      I have checked SDN POD log from node where egress router POD is failing and I could see below error message.
      
      ~~~
      2022-11-15T21:51:29.283002590Z W1115 21:51:29.282954  181720 pod.go:296] CNI_ADD stage-wfe-proxy/stage-wfe-proxy-ext-qrhjw failed: could not open netns "/var/run/netns/8c5ca402-3381-4935-baed-ea454161d669": unknown FS magic on "/var/run/netns/8c5ca402-3381-4935-baed-ea454161d669": 1021994
      ~~~
      
      Crio is logging below event and looking at the log it seems the namespace has been created on node.
      
      ~~~
      Nov 15 21:51:29 pltocpwn03 crio[3150]: time="2022-11-15 21:51:29.307184956Z" level=info msg="Got pod network &{Name:stage-wfe-proxy-ext-qrhjw Namespace:stage-wfe-proxy ID:72bcf9e52b199061d6e651e84b0892efc142601b2442c2d00b92a1ba23208344 UID:c965a287-28aa-47b6-9e79-0cc0e209fcf2 NetNS:/var/run/netns/8c5ca402-3381-4935-baed-ea454161d669 Networks:[] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
      ~~~
      
      

      Version-Release number of selected component (if applicable):

      4.11.12
      

      How reproducible:

      Not Sure
      

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      Egress router POD is failing to create. Sample application could be created without any issue.
      

      Expected results:

      Egress router POD should get created
      

      Additional info:

      Egress router POD is created following below document and it does contain pod.network.openshift.io/assign-macvlan: "true" annotation.
      
      https://docs.openshift.com/container-platform/4.11/networking/openshift_sdn/deploying-egress-router-layer3-redirection.html#nw-egress-router-pod_deploying-egress-router-layer3-redirection
      

      Attachments

        Issue Links

          Activity

            People

              pdiak@redhat.com Patryk Diak
              openshift-crt-jira-prow OpenShift Prow Bot
              Weibin Liang Weibin Liang
              Andreas Karis
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: