Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-26530

[ARO] PodLatencies are impacted during c.d.v2 at 252 Nodes

XMLWordPrintable

    • Important
    • No
    • SDN Sprint 247, SDN Sprint 248, SDN Sprint 249, SDN Sprint 250
    • 4
    • False
    • Hide

      None

      Show
      None

      Description of problem:
      When running cluster-density-v2 with 2268 Iterations on a 252-node on 4.13 ARO setup, the overall podLatency times are extremely high
      The OCP Cluster was created on ARO which had following Instance type: 

      Master Type:  Standard_D32s_v5
      Worker Type:  Standard_D8s_v5
      Infra Type:   Standard_E16s_v5

      Version-Release number of selected component (if applicable):

          4.13.22

      How reproducible:

      This is reproducible, an is only seen at scale

      Steps to Reproduce:

          1. Run cluster-density-v2 with 2268 namespaces on 252 nodes
          2. git clone https://github.com/cloud-bulldozer/e2e-benchmarking; cd e2e-benchmarking/workloads/kube-burner-ocp-wrapper
          3. ITERATIONS=2268 WORKLOAD=cluster-density-v2 ./run.sh     

      Actual results:

          Pod's readinessprobe impacts the Pod state.

      Expected results:

          All the pods should be in Running state

      Additional info:

      $ oc describe po client-1-c7d4c6df6-rth25  -n cluster-density-v2-596 
      Name:             client-1-c7d4c6df6-rth25
      Namespace:        cluster-density-v2-596
      Priority:         0
      Service Account:  default
      Node:             krishvoor-scale-2hfcr-worker-eastus1-dhpmh/10.0.2.168
      Start Time:       Wed, 27 Dec 2023 12:28:13 +0530
      Labels:           app=client
                        kube-burner-index=3
                        kube-burner-job=cluster-density-v2
                        kube-burner-runid=a5fa9b9b-8a3d-4986-9025-e4b03efcdb85
                        kube-burner-uuid=a24d848b-ba4e-4ee7-8b41-472f6ff881a2
                        name=client-1
                        pod-template-hash=c7d4c6df6
      Annotations:      k8s.ovn.org/pod-networks:
                          {"default":{"ip_addresses":["10.130.34.110/23"],"mac_address":"0a:58:0a:82:22:6e","gateway_ips":["10.130.34.1"],"ip_address":"10.130.34.11...
                        k8s.v1.cni.cncf.io/network-status:
                          [{
                              "name": "ovn-kubernetes",
                              "interface": "eth0",
                              "ips": [
                                  "10.130.34.110"
                              ],
                              "mac": "0a:58:0a:82:22:6e",
                              "default": true,
                              "dns": {}
                          }]
                        openshift.io/scc: restricted-v2
                        seccomp.security.alpha.kubernetes.io/pod: runtime/default
      Status:           Running
      IP:               10.130.34.110
      IPs:
        IP:           10.130.34.110
      Controlled By:  ReplicaSet/client-1-c7d4c6df6
      Containers:
        client-app:
          Container ID:  cri-o://4241e52f372ebc1ed9eeed4e865fa9b6a995079d0ee4b9138d65b16773a5273d
          Image:         quay.io/cloud-bulldozer/curl:latest
          Image ID:      quay.io/cloud-bulldozer/curl@sha256:4311823d3576c0b7330beccbe09896ff0378c9c1c6f6974ff9064af803fed766
          Port:          <none>
          Host Port:     <none>
          Command:
            sleep
            inf
          State:          Running
            Started:      Wed, 27 Dec 2023 12:28:21 +0530
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:      10m
            memory:   10Mi
          Readiness:  exec [/bin/sh -c curl --fail -sS ${SERVICE_ENDPOINT} -o /dev/null && curl --fail -sSk ${ROUTE_ENDPOINT} -o /dev/null] delay=0s timeout=5s period=10s #success=1 #failure=3
          Environment:
            ENVVAR1:           kc6ykh5F7XP7fVbKHSQJJnSA3lgfsXR0ue0AskN8id0W4JkMF8iozCwPvIHamgaHUwnq4xgKKdjc9Xdw4M2AEaHCQUVwwyxtxI5yopzUsQ91iF5DH6qM5sCLWSG0qqjatzh42AjGAUR1qxojcse9X927umXkCO1pwIUOQIBvDINtSrPAcSDKrVJIEUA6tzpFCfxgYW3kKv2CPaXsMtcAeDB5ZGgjyrx5CZ0jRLPXuwP4HCUWP3srfPPfQK
            ENVVAR2:           xqX8MhvWT3Acp2WYUHgwsK2N9fgialvYDbWjghDQcVVinlz32l5ygQI2d9PjhCLDHWiwrYiNaikConbuicwRhDv4hhjF4YTQbqNg2y0Rlt6EoGc0AUE1PPzkd3mJJe5X9IHnc3gdFk7hIrA7p1aS8fhoOchzk7oxnBOJ0iAPtKkVupWAeC9zzmDYjEpPMfK49Ll0E9CTfc7cz5uEvN5cqwYQvE4NpAJLrQjhz7JmOBeF1bYNXtOnvsZSRx
            ENVVAR3:           1iBE9wYFnMwCWeg9CcSHCrPtL5CJ2WcJyS6jPrrXHWf860Gr5jpyaEk0OuctkVkym3KtncUNMgjfC8iLls49x6DOxktqxDCuqc6Mea5p9gzRcRXOlTfnm4Yd3ILy9paYKsxCs8Kl2ipYAfpxWVRvjKic0hcPWrQWjXY4jEE8cg4PVJYjFrXskjBgptlV1B9W2gmGaB3GFuzDwwBtHpEW8EXjeEEKyJStKzzdeLkyAUoRS2YeEYEcTO6y6S
            ENVVAR4:           UFVArwbf7cfLH1CPtcKlNWKaoVTwW0ZG2Q78sVKTz75VpG4oBxItbnIwEKkbhwbxweWVxF2qfIwcyYTqyg2FefBRcxWwPs4Yxrheqs0uUAeewo2dGoOSW6iQTaMTKXmGDpFB17p2hWWXymwtxwedhLR56XBSW3Uyaqb3p7vnWCnYu6UVdF81ztBIyq4zA4hnWwIBcQ5HL43WxGmKr4iUF4U1Wj7OyTqD5YyzByhervnyMHOR5myFThtSPD
            ROUTE_ENDPOINT:    https://cluster-density-1-cluster-density-v2-596.apps.w1i1fsrv.eastus.aroapp.io/256.html
            SERVICE_ENDPOINT:  http://cluster-density-3/256.html
          Mounts:
            /configmap1 from configmap-1 (rw)
            /configmap2 from configmap-2 (rw)
            /configmap3 from configmap-3 (rw)
            /configmap4 from configmap-4 (rw)
            /etc/podlabels from podinfo (rw)
            /secret1 from secret-1 (rw)
            /secret2 from secret-2 (rw)
            /secret3 from secret-3 (rw)
            /secret4 from secret-4 (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6q7tk (ro)
      Conditions:
        Type              Status
        Initialized       True 
        Ready             False 
        ContainersReady   False 
        PodScheduled      True 
      Volumes:
        secret-1:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  cluster-density-v2-1
          Optional:    false
        secret-2:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  cluster-density-v2-2
          Optional:    false
        secret-3:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  cluster-density-v2-3
          Optional:    false
        secret-4:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  cluster-density-v2-4
          Optional:    false
        configmap-1:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      cluster-density-v2-1
          Optional:  false
        configmap-2:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      cluster-density-v2-2
          Optional:  false
        configmap-3:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      cluster-density-v2-3
          Optional:  false
        configmap-4:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      cluster-density-v2-4
          Optional:  false
        podinfo:
          Type:  DownwardAPI (a volume populated by information about the pod)
          Items:
            metadata.labels -> labels
        kube-api-access-6q7tk:
          Type:                     Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:   3607
          ConfigMapName:            kube-root-ca.crt
          ConfigMapOptional:        <nil>
          DownwardAPI:              true
          ConfigMapName:            openshift-service-ca.crt
          ConfigMapOptional:        <nil>
      QoS Class:                    Burstable
      Node-Selectors:               <none>
      Tolerations:                  node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                                    node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                    node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Topology Spread Constraints:  kubernetes.io/hostname:ScheduleAnyway when max skew 1 is exceeded for selector app=client
      Events:
        Type     Reason          Age                   From               Message
        ----     ------          ----                  ----               -------
        Normal   Scheduled       10m                   default-scheduler  Successfully assigned cluster-density-v2-596/client-1-c7d4c6df6-rth25 to krishvoor-scale-2hfcr-worker-eastus1-dhpmh
        Normal   AddedInterface  10m                   multus             Add eth0 [10.130.34.110/23] from ovn-kubernetes
        Normal   Pulled          10m                   kubelet            Container image "quay.io/cloud-bulldozer/curl:latest" already present on machine
        Normal   Created         10m                   kubelet            Created container client-app
        Normal   Started         10m                   kubelet            Started container client-app
        Warning  Unhealthy       10m                   kubelet            Readiness probe failed: curl: (7) Failed to connect to cluster-density-3 port 80 after 2 ms: Couldn't connect to server
        Warning  Unhealthy       10m                   kubelet            Readiness probe failed: curl: (7) Failed to connect to cluster-density-3 port 80 after 0 ms: Couldn't connect to server
        Warning  Unhealthy       10m                   kubelet            Readiness probe failed: curl: (7) Failed to connect to cluster-density-3 port 80 after 1 ms: Couldn't connect to server
        Warning  Unhealthy       10m                   kubelet            Readiness probe failed: curl: (22) The requested URL returned error: 503
        Warning  Unhealthy       13s (x36 over 9m53s)  kubelet            Readiness probe failed: command timed out

      Ping test to a Infra Node:

      $ oc debug node/krishvoor-scale-2hfcr-infra-aro-machinesets-eastus-2-tbb7b
      Starting pod/krishvoor-scale-2hfcr-infra-aro-machinesets-eastus-2-tbb7b-debug ...
      To use host binaries, run `chroot /host`
      Pod IP: 10.0.2.11
      If you don't see a command prompt, try pressing enter.
      sh-4.4# ping 10.0.2.149
      PING 10.0.2.149 (10.0.2.149) 56(84) bytes of data.
      64 bytes from 10.0.2.149: icmp_seq=3 ttl=64 time=5.77 ms
      64 bytes from 10.0.2.149: icmp_seq=4 ttl=64 time=8.90 ms
      64 bytes from 10.0.2.149: icmp_seq=9 ttl=64 time=4.38 ms
      64 bytes from 10.0.2.149: icmp_seq=12 ttl=64 time=2.78 ms
      ^C
      --- 10.0.2.149 ping statistics ---
      12 packets transmitted, 4 received, 66.6667% packet loss, time 11218ms
      rtt min/avg/max/mdev = 2.783/5.458/8.896/2.250 ms

      ==================

      Between workers the ping test was successful:

      $ oc debug node/krishvoor-scale-2hfcr-worker-eastus3-xj2tj
      Starting pod/krishvoor-scale-2hfcr-worker-eastus3-xj2tj-debug ...
      To use host binaries, run `chroot /host`
      Pod IP: 10.0.2.112
      If you don't see a command prompt, try pressing enter.
      sh-4.4# ping 10.0.2.159
      PING 10.0.2.159 (10.0.2.159) 56(84) bytes of data.
      64 bytes from 10.0.2.159: icmp_seq=1 ttl=64 time=2.82 ms
      64 bytes from 10.0.2.159: icmp_seq=2 ttl=64 time=7.95 ms
      64 bytes from 10.0.2.159: icmp_seq=3 ttl=64 time=9.74 ms
      64 bytes from 10.0.2.159: icmp_seq=4 ttl=64 time=7.65 ms
      64 bytes from 10.0.2.159: icmp_seq=5 ttl=64 time=0.450 ms
      64 bytes from 10.0.2.159: icmp_seq=6 ttl=64 time=5.92 ms
      64 bytes from 10.0.2.159: icmp_seq=7 ttl=64 time=0.538 ms
      64 bytes from 10.0.2.159: icmp_seq=8 ttl=64 time=0.692 ms
      64 bytes from 10.0.2.159: icmp_seq=9 ttl=64 time=0.875 ms
      ^C
      — 10.0.2.159 ping statistics —
      9 packets transmitted, 9 received, 0% packet loss, time 8097ms
      rtt min/avg/max/mdev = 0.450/4.069/9.736/3.529 ms
      $
      
      

      must-gather: http://perf1.perf.lab.eng.bos.redhat.com/pub/mukrishn/OCPBUGS-26530/ 

              npinaeva@redhat.com Nadia Pinaeva
              rh-ee-krvoora Krishna Harsha Voora
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: