Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-25876

[ARO] Pod Latency is very high at 252 Nodes

XMLWordPrintable

    • Important
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When running cluster-density-v2 with 2268 Iterations on a 252-node 4.13 ARO setup, the readinessProbe on SERVICE_ENDPOINT/ ROUTE_ENDPOINT starts fluctuating thus impacting the pod overall podLatency times. This podLatency increases nearly after 1800 iterations. This observation is reproducible.

      ARO Instance details:
      Master Type:   Standard_D32s_v5
      Worker Type:  Standard_D8s_v5
      Infra Type:      Standard_E16s_v5

      Enclosing some of our analysis logs:

      I removed the CURL to SERVICE_ENDPOINT & it still fails on the curl to ROUTE_ENDPOINT
      
      [root@vkommadi ~]# oc describe po client-1-68cb885d6c-6zr6m 
      Name:             client-1-68cb885d6c-6zr6m
      Namespace:        cluster-density-v3-206
      Priority:         0
      Service Account:  default
      Node:             krishvoor-scale-2hfcr-worker-eastus2-5rn5d/10.0.2.184
      Start Time:       Thu, 21 Dec 2023 13:53:19 +0000
      Labels:           app=client
                        kube-burner-index=3
                        kube-burner-job=cluster-density-v3
                        kube-burner-runid=d49e025b-c618-404b-810d-443842200ad0
                        kube-burner-uuid=b60a4471-dfdc-4335-a21c-d728678cde4b
                        name=client-1
                        pod-template-hash=68cb885d6c
      Annotations:      k8s.ovn.org/pod-networks:
                          {"default":{"ip_addresses":["10.131.128.247/23"],"mac_address":"0a:58:0a:83:80:f7","gateway_ips":["10.131.128.1"],"ip_address":"10.131.128...
                        k8s.v1.cni.cncf.io/network-status:
                          [{
                              "name": "ovn-kubernetes",
                              "interface": "eth0",
                              "ips": [
                                  "10.131.128.247"
                              ],
                              "mac": "0a:58:0a:83:80:f7",
                              "default": true,
                              "dns": {}
                          }]
                        openshift.io/scc: restricted-v2
                        seccomp.security.alpha.kubernetes.io/pod: runtime/default
      Status:           Running
      SeccompProfile:   RuntimeDefault
      IP:               10.131.128.247
      IPs:
        IP:           10.131.128.247
      Controlled By:  ReplicaSet/client-1-68cb885d6c
      Containers:
        client-app:
          Container ID:  cri-o://67918f0700f7cb4677868e6554e4e3ddf3e05c58319b5da5144c92056621e920
          Image:         quay.io/cloud-bulldozer/curl:latest
          Image ID:      quay.io/cloud-bulldozer/curl@sha256:4311823d3576c0b7330beccbe09896ff0378c9c1c6f6974ff9064af803fed766
          Port:          <none>
          Host Port:     <none>
          Command:
            sleep
            inf
          State:          Running
            Started:      Thu, 21 Dec 2023 13:53:25 +0000
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:      10m
            memory:   10Mi
          Readiness:  exec [/bin/sh -c curl --fail -sSk ${ROUTE_ENDPOINT} -o /dev/null] delay=0s timeout=5s period=10s #success=1 #failure=3
          Environment:
            ENVVAR1:           2nVT83kJY9jjZOhU5TtN29WkVGlnvKNzQpQxpvTYXSBYbjvrDqvasvBMNNJOfObavXP1btEnA6zeyVQCe4gUs0xaPWfswedWo8Vvbl4Osc6oUreJQzKwcC71Kdat2UCU3o3biVjP5I4HjB2xKxQ6uuckBaM9Hqr964oX5sGZIyFSZZn4MR49vXClOue0IZhJ0cTHO1uMHScStNtZPCG996M9rVjbRPAuEKaZitEEANQzU7DJEitmBi5Wzc
            ENVVAR2:           Jyfu0e9VV9Z5VxxnX27ysLkmPqLCuqZqTXKS4ShegYSRtYW8sjZcyQXG6lWETlNoLs4mjdAxm9zbTKBbzlkTmD6uZgi6X1B8nULQIdhjChWhDFiNcNnlpmxjCETxweqy4pOWlyuYoUT789yfjMmtT8GqmZNN6rkjZuAL2ufn77giG8dG88XXS2xvKPR0cmd37EPHeUHnitV6vnpQmAEG1AXjxXuZoBTBWkOXcUV7RdwuOd5eIKqy5MuxSs
            ENVVAR3:           u6FOE96YFomlojCYDuyuWt8YNR295FkCCfQ0jpafvoSUHOG5XoABw0iHjKh0Yi0125gWz75MRQGK9lVWFm8Twjx6VPi5xaZ6EzuDpBtwhEBqJzsBkoUnrdwPk3mFf6Ezx5maaDLjaozTXjmFbpMOqmhKdQ44cP74mylnUX3qA09T4k0DQKH8h3aSygU2xclisJ7itEH16Z5UlG5b8NZHLSTlZR9kAGBkDlWq5HrYwgwXVlYslMdJ0a9e81
            ENVVAR4:           QJDNduucelZhgOxnONZbnIMsHQuETQ1JbLOqho5wUIPDUiw0tSTxO8YUcl2hO32tIGoD25b5f3shExWNZ6JGknOXhDZkxkKZdwVaT0bnyo9CHd68foUuaorQ8PzQutNIkm7dEwjlOURpMBVK7cy1VBLw5YvNWqw1f8G3k7EBhsAZXrQ1YTQI9QQTVncp5PosMJLVpreOdqMugcr2WpUwJTna8IVcUcEmLen9y6pdE3OgxPV30rZ7GTr7bh
            ROUTE_ENDPOINT:    https://cluster-density-1-cluster-density-v2-206.apps.w1i1fsrv.eastus.aroapp.io/256.html
            SERVICE_ENDPOINT:  http://cluster-density-1/256.html
          Mounts:
            /configmap1 from configmap-1 (rw)
            /configmap2 from configmap-2 (rw)
            /configmap3 from configmap-3 (rw)
            /configmap4 from configmap-4 (rw)
            /etc/podlabels from podinfo (rw)
            /secret1 from secret-1 (rw)
            /secret2 from secret-2 (rw)
            /secret3 from secret-3 (rw)
            /secret4 from secret-4 (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h9lrh (ro)
      Conditions:
        Type              Status
        Initialized       True 
        Ready             False 
        ContainersReady   False 
        PodScheduled      True 
      Volumes:
        secret-1:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  cluster-density-v3-1
          Optional:    false
        secret-2:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  cluster-density-v3-2
          Optional:    false
        secret-3:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  cluster-density-v3-3
          Optional:    false
        secret-4:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  cluster-density-v3-4
          Optional:    false
        configmap-1:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      cluster-density-v3-1
          Optional:  false
        configmap-2:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      cluster-density-v3-2
          Optional:  false
        configmap-3:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      cluster-density-v3-3
          Optional:  false
        configmap-4:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      cluster-density-v3-4
          Optional:  false
        podinfo:
          Type:  DownwardAPI (a volume populated by information about the pod)
          Items:
            metadata.labels -> labels
        kube-api-access-h9lrh:
          Type:                     Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:   3607
          ConfigMapName:            kube-root-ca.crt
          ConfigMapOptional:        <nil>
          DownwardAPI:              true
          ConfigMapName:            openshift-service-ca.crt
          ConfigMapOptional:        <nil>
      QoS Class:                    Burstable
      Node-Selectors:               <none>
      Tolerations:                  node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                                    node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                    node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Topology Spread Constraints:  kubernetes.io/hostname:ScheduleAnyway when max skew 1 is exceeded for selector app=client
      Events:
        Type     Reason          Age                From               Message
        ----     ------          ----               ----               -------
        Normal   Scheduled       15m                default-scheduler  Successfully assigned cluster-density-v3-206/client-1-68cb885d6c-6zr6m to krishvoor-scale-2hfcr-worker-eastus2-5rn5d
        Warning  FailedMount     15m (x2 over 15m)  kubelet            MountVolume.SetUp failed for volume "configmap-2" : configmap "cluster-density-v3-2" not found
        Warning  FailedMount     15m                kubelet            MountVolume.SetUp failed for volume "secret-1" : secret "cluster-density-v3-1" not found
        Warning  FailedMount     15m (x2 over 15m)  kubelet            MountVolume.SetUp failed for volume "configmap-4" : configmap "cluster-density-v3-4" not found
        Warning  FailedMount     15m                kubelet            MountVolume.SetUp failed for volume "secret-3" : secret "cluster-density-v3-3" not found
        Warning  FailedMount     15m (x2 over 15m)  kubelet            MountVolume.SetUp failed for volume "configmap-3" : configmap "cluster-density-v3-3" not found
        Warning  FailedMount     15m                kubelet            MountVolume.SetUp failed for volume "secret-2" : secret "cluster-density-v3-2" not found
        Warning  FailedMount     15m (x2 over 15m)  kubelet            MountVolume.SetUp failed for volume "configmap-1" : configmap "cluster-density-v3-1" not found
        Warning  FailedMount     15m                kubelet            MountVolume.SetUp failed for volume "secret-4" : secret "cluster-density-v3-4" not found
        Normal   AddedInterface  15m                multus             Add eth0 [10.131.128.247/23] from ovn-kubernetes
        Normal   Pulled          15m                kubelet            Container image "quay.io/cloud-bulldozer/curl:latest" already present on machine
        Normal   Created         15m                kubelet            Created container client-app
        Normal   Started         15m                kubelet            Started container client-app
        Warning  Unhealthy       13m (x8 over 14m)  kubelet            Readiness probe failed: curl: (22) The requested URL returned error: 503
        Warning  Unhealthy       1s (x55 over 14m)  kubelet            Readiness probe failed: command timed out
      [root@vkommadi ~]# curl --fail -sSk https://cluster-density-1-cluster-density-v2-206.apps.w1i1fsrv.eastus.aroapp.io/256.html
      curl: (22) The requested URL returned error: 503 Service Unavailable
      [root@vkommadi ~]# 
      [root@vkommadi ~]# oc exec -it client-1-68cb885d6c-6zr6m bash
      kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
      ERRO[0000] exec failed: unable to start container process: exec: "bash": executable file not found in $PATH 
      command terminated with exit code 255
      [root@vkommadi ~]# oc exec -it client-1-68cb885d6c-6zr6m sh
      kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
      ~ $ time curl --fail -sSk https://cluster-density-1-cluster-density-v2-206.apps.w1i1fsrv.eastus.aroapp.io/256.html
      curl: (22) The requested URL returned error: 503
      Command exited with non-zero status 22
      real    0m 0.20s
      user    0m 0.00s
      sys    0m 0.00s
      ~ $ https://cluster-density-1-cluster-density-v2-206.apps.w1i1fsrv.eastus.aroapp.io/256.html
      curl: (22) The requested URL returned error: 503
      ~ $ 

      Version-Release number of selected component (if applicable):

      OCP Version: 4.13.22
      
      HAProxy Version:
      sh-4.4$ /usr/sbin/haproxy -v                      
      HA-Proxy version 2.2.24-26b8015 2022/05/13 - https://haproxy.org/
      Status: long-term supported branch - will stop receiving fixes around Q2 2025.
      Known bugs: http://www.haproxy.org/bugs/bugs-2.2.24.html
      Running on: Linux 5.14.0-284.40.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov 1 10:30:09 EDT 2023 x86_64
      sh-4.4$ 
      
      Router Version:
      sh-4.4$ /usr/bin/openshift-router version
      openshift-router
      majorFromGit: 
      minorFromGit: 
      commitFromGit: 431a6e667025931c68b4f747a224af29edf356f6
      versionFromGit: 4.0.0-429-g431a6e66
      gitTreeState: clean
      buildDate: 2023-11-03T11:43:21Z
      sh-4.4${code}
      How reproducible:
      {code:none}
      Easily reproducible

      Steps to Reproduce:

          1. Run cluster-density-v2 with 2268 namespaces on 252 nodes
          2. Check if all pods in the cluster-density-v2 are up

      Actual results:

      Pod's readinessprobe impacts the Pod state.

      Expected results:

      All the pods should be in Running state

      Additional info:

      OVN Tracing:
      CNI Add started after  4 mins of ConfigureOVS
      [root@vkommadi ~]# oc logs -n openshift-ovn-kubernetes ovnkube-node-pknwj -c ovnkube-node | grep "68cb885d6c-6zr6m"
      I1221 13:53:21.187500    3188 cni.go:265] [cluster-density-v3-206/client-1-68cb885d6c-6zr6m e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc] ADD starting CNI request [cluster-density-v3-206/client-1-68cb885d6c-6zr6m e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc]
      I1221 13:53:21.189502    3188 helper_linux.go:363] ConfigureOVS: namespace: cluster-density-v3-206, podName: client-1-68cb885d6c-6zr6m, network: default, NAD default, SandboxID: "e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc", UID: "d36f3b2b-0d15-4fa6-9fbd-acdf9088fd40", MAC: 0a:58:0a:83:80:f7, IPs: [10.131.128.247/23]
      I1221 13:53:25.018680    3188 cni.go:286] [cluster-density-v3-206/client-1-68cb885d6c-6zr6m e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc] ADD finished CNI request [cluster-density-v3-206/client-1-68cb885d6c-6zr6m e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc], result "{\"interfaces\":[{\"name\":\"e1fbb5e91647214\",\"mac\":\"3e:97:b8:40:6c:6e\"},{\"name\":\"eth0\",\"mac\":\"0a:58:0a:83:80:f7\",\"sandbox\":\"/var/run/netns/b212efc5-3235-4f09-9661-d0faf583e930\"}],\"ips\":[{\"interface\":1,\"address\":\"10.131.128.247/23\",\"gateway\":\"10.131.128.1\"}],\"dns\":{}}", err <nil>
      [root@vkommadi ~]#
      
      =========================================================
      =========================================================
      
      [root@vkommadi ~]# oc logs -n openshift-ovn-kubernetes ovnkube-node-pknwj -c ovn-controller | grep "68cb885d6c-6zr6m"
      2023-12-21T13:53:24.848Z|01966|binding|INFO|Claiming lport cluster-density-v3-206_client-1-68cb885d6c-6zr6m for this chassis.
      2023-12-21T13:53:24.848Z|01967|binding|INFO|cluster-density-v3-206_client-1-68cb885d6c-6zr6m: Claiming 0a:58:0a:83:80:f7 10.131.128.247
      2023-12-21T13:53:24.954Z|01968|binding|INFO|Setting lport cluster-density-v3-206_client-1-68cb885d6c-6zr6m ovn-installed in OVS
      2023-12-21T13:53:24.954Z|01969|binding|INFO|Setting lport cluster-density-v3-206_client-1-68cb885d6c-6zr6m up in Southbound
      [root@vkommadi ~]#
      
      =========================================================
      =========================================================
      
      krvoora  ~  Desktop  Git-Stuff  benchmark-operator  oc logs router-default-9555f79fc-6br7r  | grep v3
      E1221 13:43:34.545752       1 plugin.go:283] unable to find service cluster-density-v3-206/cluster-density-2: Service "cluster-density-2" not found
      E1221 13:43:34.545930       1 plugin.go:283] unable to find service cluster-density-v3-206/cluster-density-2: Service "cluster-density-2" not found
      E1221 13:43:34.575981       1 plugin.go:283] unable to find service cluster-density-v3-206/cluster-density-2: Service "cluster-density-2" not found
      E1221 13:43:34.577266       1 plugin.go:283] unable to find service cluster-density-v3-206/cluster-density-2: Service "cluster-density-2" not found
      E1221 13:44:04.585319       1 plugin.go:283] unable to find service cluster-density-v3-35/cluster-density-1: Service "cluster-density-1" not found
      E1221 13:44:04.585456       1 plugin.go:283] unable to find service cluster-density-v3-35/cluster-density-1: Service "cluster-density-1" not found
      krvoora  ~  Desktop  Git-Stuff  benchmark-operator 
      
      ============================================================================
      
      

       

              bbennett@redhat.com Ben Bennett
              rh-ee-krvoora Krishna Harsha Voora
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: