Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18981

multus pod failed to access host file with os.Stat()

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Critical Critical
    • None
    • 4.14
    • Networking / multus
    • None
    • Critical
    • No
    • Approved
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When debuging OCPBUGS-18389, we found many errors related to the file in mount volume, such as:
      1. 2023-09-12T06:19:29.972173907Z 2023-09-12T06:19:29Z [error] Multus: [node-density/node-density-2787/ee2d38a7-d1bb-439f-84da-8827f95a6ce6]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /host/run/multus/cni/net.d/80-openshift-network.conf. pollimmediate error: timed out waiting for the condition
      2. 2023-09-12T06:19:31.447112932Z E0912 06:19:31.447093    1994 token_source.go:180] Unable to rotate token: failed to read token file "/var/run/secrets/kubernetes.io/serviceaccount/token": open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
      
      These error cause the failure of CNI_ADD of pods, thus increase the latency of pod creation. And this symtom only happens on cluster using OpenShiftSDN as CNI plugin.
      
      Please find the MustGather bellow
      OCP Version Flexy Id Scale Ci Job Grafana URL Cloud Arch Type Network Type Worker Count PODS_PER_NODE Avg Pod Ready (ms) P99 Pod Ready (ms) Must-gather
      4.14.0-0.nightly-2023-09-02-132842 231558 291 62404e34-672e-4168-b4cc-0bd575768aad aws amd64 SDN 24 245 58725 294279 https://drive.google.com/file/d/1BbVeNrWzVdogFhYihNfv-99_q8oj6eCN/view?usp=drive_link

      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-09-02-132842

      How reproducible:

      This issue happened when running pod density test. But we can also see the symptom when deploy many (>50) pods to a node.
      
      

      Steps to Reproduce:

      1. Create a OCP cluster
      2. Leave one worker, cordon other worker nodes, 
      3. kubectl create deployment my-dep --image=quay.io/jitesoft/nginx --replicas=50
      4. check the log of multus pod on that node. 

      Actual results:

      In the log there are many errors like:
      1. 2023-09-12T06:19:29.972173907Z 2023-09-12T06:19:29Z [error] Multus: [node-density/node-density-2787/ee2d38a7-d1bb-439f-84da-8827f95a6ce6]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /host/run/multus/cni/net.d/80-openshift-network.conf. pollimmediate error: timed out waiting for the condition
      2. 2023-09-12T06:19:31.447112932Z E0912 06:19:31.447093 1994 token_source.go:180] Unable to rotate token: failed to read token file "/var/run/secrets/kubernetes.io/serviceaccount/token": open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory 

      Expected results:

       

      Additional info:

       

            pliurh Peng Liu
            pliurh Peng Liu
            Sunil Choudhary Sunil Choudhary
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: