Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.13
Component/s: Networking / ovn-kubernetes
Labels:
- SDN:SCALE

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
SDN Sprint 247, SDN Sprint 248, SDN Sprint 249, SDN Sprint 250
sprint_count:
4

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:
When running cluster-density-v2 with 2268 Iterations on a 252-node on 4.13 ARO setup, the overall podLatency times are extremely high
The OCP Cluster was created on ARO which had following Instance type:

Master Type:  Standard_D32s_v5
Worker Type:  Standard_D8s_v5
Infra Type:   Standard_E16s_v5

Version-Release number of selected component (if applicable):

    4.13.22

How reproducible:

This is reproducible, an is only seen at scale

Steps to Reproduce:

    1. Run cluster-density-v2 with 2268 namespaces on 252 nodes
    2. git clone https://github.com/cloud-bulldozer/e2e-benchmarking; cd e2e-benchmarking/workloads/kube-burner-ocp-wrapper
    3. ITERATIONS=2268 WORKLOAD=cluster-density-v2 ./run.sh

Actual results:

    Pod's readinessprobe impacts the Pod state.

Expected results:

    All the pods should be in Running state

Additional info:

$ oc describe po client-1-c7d4c6df6-rth25  -n cluster-density-v2-596 
Name:             client-1-c7d4c6df6-rth25
Namespace:        cluster-density-v2-596
Priority:         0
Service Account:  default
Node:             krishvoor-scale-2hfcr-worker-eastus1-dhpmh/10.0.2.168
Start Time:       Wed, 27 Dec 2023 12:28:13 +0530
Labels:           app=client
                  kube-burner-index=3
                  kube-burner-job=cluster-density-v2
                  kube-burner-runid=a5fa9b9b-8a3d-4986-9025-e4b03efcdb85
                  kube-burner-uuid=a24d848b-ba4e-4ee7-8b41-472f6ff881a2
                  name=client-1
                  pod-template-hash=c7d4c6df6
Annotations:      k8s.ovn.org/pod-networks:
                    {"default":{"ip_addresses":["10.130.34.110/23"],"mac_address":"0a:58:0a:82:22:6e","gateway_ips":["10.130.34.1"],"ip_address":"10.130.34.11...
                  k8s.v1.cni.cncf.io/network-status:
                    [{
                        "name": "ovn-kubernetes",
                        "interface": "eth0",
                        "ips": [
                            "10.130.34.110"
                        ],
                        "mac": "0a:58:0a:82:22:6e",
                        "default": true,
                        "dns": {}
                    }]
                  openshift.io/scc: restricted-v2
                  seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:           Running
IP:               10.130.34.110
IPs:
  IP:           10.130.34.110
Controlled By:  ReplicaSet/client-1-c7d4c6df6
Containers:
  client-app:
    Container ID:  cri-o://4241e52f372ebc1ed9eeed4e865fa9b6a995079d0ee4b9138d65b16773a5273d
    Image:         quay.io/cloud-bulldozer/curl:latest
    Image ID:      quay.io/cloud-bulldozer/curl@sha256:4311823d3576c0b7330beccbe09896ff0378c9c1c6f6974ff9064af803fed766
    Port:          <none>
    Host Port:     <none>
    Command:
      sleep
      inf
    State:          Running
      Started:      Wed, 27 Dec 2023 12:28:21 +0530
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:      10m
      memory:   10Mi
    Readiness:  exec [/bin/sh -c curl --fail -sS ${SERVICE_ENDPOINT} -o /dev/null && curl --fail -sSk ${ROUTE_ENDPOINT} -o /dev/null] delay=0s timeout=5s period=10s #success=1 #failure=3
    Environment:
      ENVVAR1:           kc6ykh5F7XP7fVbKHSQJJnSA3lgfsXR0ue0AskN8id0W4JkMF8iozCwPvIHamgaHUwnq4xgKKdjc9Xdw4M2AEaHCQUVwwyxtxI5yopzUsQ91iF5DH6qM5sCLWSG0qqjatzh42AjGAUR1qxojcse9X927umXkCO1pwIUOQIBvDINtSrPAcSDKrVJIEUA6tzpFCfxgYW3kKv2CPaXsMtcAeDB5ZGgjyrx5CZ0jRLPXuwP4HCUWP3srfPPfQK
      ENVVAR2:           xqX8MhvWT3Acp2WYUHgwsK2N9fgialvYDbWjghDQcVVinlz32l5ygQI2d9PjhCLDHWiwrYiNaikConbuicwRhDv4hhjF4YTQbqNg2y0Rlt6EoGc0AUE1PPzkd3mJJe5X9IHnc3gdFk7hIrA7p1aS8fhoOchzk7oxnBOJ0iAPtKkVupWAeC9zzmDYjEpPMfK49Ll0E9CTfc7cz5uEvN5cqwYQvE4NpAJLrQjhz7JmOBeF1bYNXtOnvsZSRx
      ENVVAR3:           1iBE9wYFnMwCWeg9CcSHCrPtL5CJ2WcJyS6jPrrXHWf860Gr5jpyaEk0OuctkVkym3KtncUNMgjfC8iLls49x6DOxktqxDCuqc6Mea5p9gzRcRXOlTfnm4Yd3ILy9paYKsxCs8Kl2ipYAfpxWVRvjKic0hcPWrQWjXY4jEE8cg4PVJYjFrXskjBgptlV1B9W2gmGaB3GFuzDwwBtHpEW8EXjeEEKyJStKzzdeLkyAUoRS2YeEYEcTO6y6S
      ENVVAR4:           UFVArwbf7cfLH1CPtcKlNWKaoVTwW0ZG2Q78sVKTz75VpG4oBxItbnIwEKkbhwbxweWVxF2qfIwcyYTqyg2FefBRcxWwPs4Yxrheqs0uUAeewo2dGoOSW6iQTaMTKXmGDpFB17p2hWWXymwtxwedhLR56XBSW3Uyaqb3p7vnWCnYu6UVdF81ztBIyq4zA4hnWwIBcQ5HL43WxGmKr4iUF4U1Wj7OyTqD5YyzByhervnyMHOR5myFThtSPD
      ROUTE_ENDPOINT:    https://cluster-density-1-cluster-density-v2-596.apps.w1i1fsrv.eastus.aroapp.io/256.html
      SERVICE_ENDPOINT:  http://cluster-density-3/256.html
    Mounts:
      /configmap1 from configmap-1 (rw)
      /configmap2 from configmap-2 (rw)
      /configmap3 from configmap-3 (rw)
      /configmap4 from configmap-4 (rw)
      /etc/podlabels from podinfo (rw)
      /secret1 from secret-1 (rw)
      /secret2 from secret-2 (rw)
      /secret3 from secret-3 (rw)
      /secret4 from secret-4 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6q7tk (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  secret-1:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-density-v2-1
    Optional:    false
  secret-2:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-density-v2-2
    Optional:    false
  secret-3:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-density-v2-3
    Optional:    false
  secret-4:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cluster-density-v2-4
    Optional:    false
  configmap-1:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cluster-density-v2-1
    Optional:  false
  configmap-2:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cluster-density-v2-2
    Optional:  false
  configmap-3:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cluster-density-v2-3
    Optional:  false
  configmap-4:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cluster-density-v2-4
    Optional:  false
  podinfo:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.labels -> labels
  kube-api-access-6q7tk:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
    ConfigMapName:            openshift-service-ca.crt
    ConfigMapOptional:        <nil>
QoS Class:                    Burstable
Node-Selectors:               <none>
Tolerations:                  node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                              node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  kubernetes.io/hostname:ScheduleAnyway when max skew 1 is exceeded for selector app=client
Events:
  Type     Reason          Age                   From               Message
  ----     ------          ----                  ----               -------
  Normal   Scheduled       10m                   default-scheduler  Successfully assigned cluster-density-v2-596/client-1-c7d4c6df6-rth25 to krishvoor-scale-2hfcr-worker-eastus1-dhpmh
  Normal   AddedInterface  10m                   multus             Add eth0 [10.130.34.110/23] from ovn-kubernetes
  Normal   Pulled          10m                   kubelet            Container image "quay.io/cloud-bulldozer/curl:latest" already present on machine
  Normal   Created         10m                   kubelet            Created container client-app
  Normal   Started         10m                   kubelet            Started container client-app
  Warning  Unhealthy       10m                   kubelet            Readiness probe failed: curl: (7) Failed to connect to cluster-density-3 port 80 after 2 ms: Couldn't connect to server
  Warning  Unhealthy       10m                   kubelet            Readiness probe failed: curl: (7) Failed to connect to cluster-density-3 port 80 after 0 ms: Couldn't connect to server
  Warning  Unhealthy       10m                   kubelet            Readiness probe failed: curl: (7) Failed to connect to cluster-density-3 port 80 after 1 ms: Couldn't connect to server
  Warning  Unhealthy       10m                   kubelet            Readiness probe failed: curl: (22) The requested URL returned error: 503
  Warning  Unhealthy       13s (x36 over 9m53s)  kubelet            Readiness probe failed: command timed out

Ping test to a Infra Node:

$ oc debug node/krishvoor-scale-2hfcr-infra-aro-machinesets-eastus-2-tbb7b
Starting pod/krishvoor-scale-2hfcr-infra-aro-machinesets-eastus-2-tbb7b-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.2.11
If you don't see a command prompt, try pressing enter.
sh-4.4# ping 10.0.2.149
PING 10.0.2.149 (10.0.2.149) 56(84) bytes of data.
64 bytes from 10.0.2.149: icmp_seq=3 ttl=64 time=5.77 ms
64 bytes from 10.0.2.149: icmp_seq=4 ttl=64 time=8.90 ms
64 bytes from 10.0.2.149: icmp_seq=9 ttl=64 time=4.38 ms
64 bytes from 10.0.2.149: icmp_seq=12 ttl=64 time=2.78 ms
^C
--- 10.0.2.149 ping statistics ---
12 packets transmitted, 4 received, 66.6667% packet loss, time 11218ms
rtt min/avg/max/mdev = 2.783/5.458/8.896/2.250 ms

==================

Between workers the ping test was successful:

$ oc debug node/krishvoor-scale-2hfcr-worker-eastus3-xj2tj
Starting pod/krishvoor-scale-2hfcr-worker-eastus3-xj2tj-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.2.112
If you don't see a command prompt, try pressing enter.
sh-4.4# ping 10.0.2.159
PING 10.0.2.159 (10.0.2.159) 56(84) bytes of data.
64 bytes from 10.0.2.159: icmp_seq=1 ttl=64 time=2.82 ms
64 bytes from 10.0.2.159: icmp_seq=2 ttl=64 time=7.95 ms
64 bytes from 10.0.2.159: icmp_seq=3 ttl=64 time=9.74 ms
64 bytes from 10.0.2.159: icmp_seq=4 ttl=64 time=7.65 ms
64 bytes from 10.0.2.159: icmp_seq=5 ttl=64 time=0.450 ms
64 bytes from 10.0.2.159: icmp_seq=6 ttl=64 time=5.92 ms
64 bytes from 10.0.2.159: icmp_seq=7 ttl=64 time=0.538 ms
64 bytes from 10.0.2.159: icmp_seq=8 ttl=64 time=0.692 ms
64 bytes from 10.0.2.159: icmp_seq=9 ttl=64 time=0.875 ms
^C
— 10.0.2.159 ping statistics —
9 packets transmitted, 9 received, 0% packet loss, time 8097ms
rtt min/avg/max/mdev = 0.450/4.069/9.736/3.529 ms
$

must-gather: http://perf1.perf.lab.eng.bos.redhat.com/pub/mukrishn/OCPBUGS-26530/

is duplicated by

OCPBUGS-25876 [ARO] Pod Latency is very high at 252 Nodes

Closed

Assignee:: Nadia Pinaeva (Inactive)

Reporter:: Krishna Harsha Voora

Need Info From:: None

Contributors:: None

QA Contact:: Anurag Saxena

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2024/01/09 11:03 AM

Updated:: 2025/07/24 5:46 AM

Resolved:: 2024/04/30 9:04 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide