Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-19440

Network-attachment-definition takes a long time (~>5 min) to get created after creation of namespace

XMLWordPrintable

    • Important
    • No
    • CNF Network Sprint 244
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      Cause: The reconcile loop for SriovNetwork fails when the created NetworkAttachmentDefinition's namespace is not present.
      Consequence: The reconcile loop retry mechanism is triggered with an exponential backoff strategy, which interval can increase over 5 minutes.
      Fix: The SriovNetwork reconcile loop now reacts on Namespace resource creation.
      Show
      Cause: The reconcile loop for SriovNetwork fails when the created NetworkAttachmentDefinition's namespace is not present. Consequence: The reconcile loop retry mechanism is triggered with an exponential backoff strategy, which interval can increase over 5 minutes. Fix: The SriovNetwork reconcile loop now reacts on Namespace resource creation.
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-8266. The following is the description of the original issue:

      Description of problem:

      Sriov networknodepolicies and networks are first created in the cluster. While running a series of cluster-density tests, target networkNamespaces are created under which pods are added to the previously created sriov networks. It takes abnormally long for the network-attachments to get created after creation of target namespace specifically when cache entries don't exist:

      See several instances of these entries in the network-resource-injector pod logs:

      I0302 16:48:16.576685       1 webhook.go:746] Received mutation request. Features status: HugePageInject: true / HonorExistingResources: false / EnableResourceNames: true                                                                    
      I0302 16:48:16.578497       1 webhook.go:763] AdmissionReview request received for pod boatload-1/                                                                                                                                            
      I0302 16:48:16.578637       1 webhook.go:719] search v1.multus-cni.io/default-network in original pod annotations                                                                                                                             
      I0302 16:48:16.578821       1 webhook.go:726] search v1.multus-cni.io/default-network in user-defined injections                                                                                                                              
      I0302 16:48:16.578909       1 webhook.go:740] v1.multus-cni.io/default-network is not found in either pod annotations or user-defined injections                                                                                              
      I0302 16:48:16.578988       1 webhook.go:719] search k8s.v1.cni.cncf.io/networks in original pod annotations                                                                                                                                  
      I0302 16:48:16.579077       1 webhook.go:722] k8s.v1.cni.cncf.io/networks is defined in original pod annotations                                                                                                                              
      I0302 16:48:16.579213       1 webhook.go:266] 'network-boatload-1' is not in JSON format: invalid character 'e' in literal null (expecting 'u')... trying to parse as comma separated network selections list                                 
      I0302 16:48:16.579397       1 webhook.go:370] cache entry not found, retrieving network attachment definition 'boatload-1/network-boatload-1' from api server                                                                                 
      E0302 16:48:16.586266       1 webhook.go:356] could not get Network Attachment Definition boatload-1/network-boatload-1: the server could not find the requested resource                                                                     
      E0302 16:48:16.586583       1 webhook.go:375] could not find network attachment definition 'boatload-1/network-boatload-1': could not get Network Attachment Definition boatload-1/network-boatload-1: the server could not find the requested
       resource                                                                                                                                                                                                                                     
      I0302 16:48:16.586982       1 webhook.go:422] sending response to the Kubernetes API server                                                                                                        
      

      When the network attachment is finally created/found (~ 5 min later):

      I0302 16:53:44.813028       1 webhook.go:746] Received mutation request. Features status: HugePageInject: true / HonorExistingResources: false / EnableResourceNames: true                                                                 
      I0302 16:53:44.815881       1 webhook.go:763] AdmissionReview request received for pod boatload-1/               
      I0302 16:53:44.816906       1 webhook.go:719] search v1.multus-cni.io/default-network in original pod annotations
      I0302 16:53:44.816921       1 webhook.go:726] search v1.multus-cni.io/default-network in user-defined injections                                
      I0302 16:53:44.816928       1 webhook.go:740] v1.multus-cni.io/default-network is not found in either pod annotations or user-defined injections
      I0302 16:53:44.816935       1 webhook.go:719] search k8s.v1.cni.cncf.io/networks in original pod annotations                  
      I0302 16:53:44.816942       1 webhook.go:722] k8s.v1.cni.cncf.io/networks is defined in original pod annotations                                                                                             
      I0302 16:53:44.816956       1 webhook.go:266] 'network-boatload-1' is not in JSON format: invalid character 'e' in literal null (expecting 'u')... trying to parse as comma separated network selections list
      I0302 16:53:44.817046       1 webhook.go:380] network attachment definition 'boatload-1/network-boatload-1' found                                                        
      I0302 16:53:44.817084       1 webhook.go:387] resource 'openshift.io/net_resource_boatload_1' needs to be requested for network 'boatload-1/network-boatload-1'                                                                               
      I0302 16:53:44.817098       1 webhook.go:823] pod boatload-1/ has resource requests: map[openshift.io/net_resource_boatload_1:1] and node selectors: map[]
      I0302 16:53:44.817140       1 webhook.go:905] patch after all mutations: [{add /spec/containers/0/env/- {CONTAINER_NAME boatload-1 nil}} {add /spec/containers/0/volumeMounts/- {podnetinfo true /etc/podnetinfo  <nil> }} {add /spec/volumes/
      - {podnetinfo {nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil &DownwardAPIVolumeSource{Items:[]DownwardAPIVolumeFile{DownwardAPIVolumeFile{Path:labels,FieldRef:&ObjectFieldSelector{APIVersion:,FieldPath:metadata.labels,},Reso
      urceFieldRef:nil,Mode:nil,},DownwardAPIVolumeFile{Path:annotations,FieldRef:&ObjectFieldSelector{APIVersion:,FieldPath:metadata.annotations,},ResourceFieldRef:nil,Mode:nil,},DownwardAPIVolumeFile{Path:hugepages_1G_request_boatload-1,Field
      Ref:nil,ResourceFieldRef:&ResourceFieldSelector{ContainerName:boatload-1,Resource:requests.hugepages-1Gi,Divisor:{{1048576 0} {<nil>}  BinarySI},},Mode:nil,},DownwardAPIVolumeFile{Path:hugepages_1G_limit_boatload-1,FieldRef:nil,ResourceFi
      eldRef:&ResourceFieldSelector{ContainerName:boatload-1,Resource:limits.hugepages-1Gi,Divisor:{{1048576 0} {<nil>}  BinarySI},},Mode:nil,},},DefaultMode:nil,} nil nil nil nil nil nil nil nil nil nil nil nil}}} {add /spec/nodeSelector map[j
      etlag:true node-role.kubernetes.io/master:]}] for pod boatload-1/                                                                                                                                                                             
      I0302 16:53:44.817270       1 webhook.go:422] sending response to the Kubernetes API server                 
      

       

      Also notice this in the multus-admission-controller pod logs:
       

      2023-03-02T16:53:04.610972543Z I0302 16:53:04.610898       1 webhook.go:117] validating network config spec: \{ "cniVersion":"0.3.1", "name":"network-boatload-1","type":"sriov","vlan":100,"spoofchk":"on","trust":"off","vlanQoS":0,"ipam":{} }
      2023-03-02T16:53:04.611200606Z I0302 16:53:04.611170       1 webhook.go:141] spec is not a valid network config list: error parsing configuration list: no 'plugins' key - trying to parse into standalone config
      2023-03-02T16:53:04.611290278Z I0302 16:53:04.611265       1 webhook.go:154] AdmissionReview request allowed: Network Attachment Definition '\{"cniVersion":"0.3.1","ipam":{},"name":"network-boatload-1","spoofchk":"on","trust":"off","type":"sriov","vlan":100,"vlanQoS":0}' is valid
      

      Version-Release number of selected component (if applicable):
      4.11.25

      How reproducible:

      Steps to Reproduce:

      1. Create sriovNetworkNodePolicy CR
      2. Create sriovNetwork CR
      3. Wait a few hours and then create target namespaces
      4. Add a pod in the target namespace to the Sriov network
      5. Watch the status of network-attachment-definitions

      Actual results:
      network-attachment-definitions get created after a significant delay after creation of target namespaces (when there is no cached entry)

      Expected results:
      network-attachment-definitions should get created immediately after creation of target namespaces

      Additional info:
      sriovnetwork, sriovnetworknodepolicy CRs and namespace creation script used along with same pod specs attached, must-gather attached

              apanatto@redhat.com Andrea Panattoni
              openshift-crt-jira-prow OpenShift Prow Bot
              Ying Wang Ying Wang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: