-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
4.13
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
No
-
None
-
None
-
None
-
SDN Sprint 247, SDN Sprint 248, SDN Sprint 249, SDN Sprint 250
-
4
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When running cluster-density-v2 with 2268 Iterations on a 252-node on 4.13 ARO setup, the overall podLatency times are extremely high
The OCP Cluster was created on ARO which had following Instance type:
Master Type: Standard_D32s_v5 Worker Type: Standard_D8s_v5 Infra Type: Standard_E16s_v5
Version-Release number of selected component (if applicable):
4.13.22
How reproducible:
This is reproducible, an is only seen at scale
Steps to Reproduce:
1. Run cluster-density-v2 with 2268 namespaces on 252 nodes
2. git clone https://github.com/cloud-bulldozer/e2e-benchmarking; cd e2e-benchmarking/workloads/kube-burner-ocp-wrapper
3. ITERATIONS=2268 WORKLOAD=cluster-density-v2 ./run.sh
Actual results:
Pod's readinessprobe impacts the Pod state.
Expected results:
All the pods should be in Running state
Additional info:
$ oc describe po client-1-c7d4c6df6-rth25 -n cluster-density-v2-596
Name: client-1-c7d4c6df6-rth25
Namespace: cluster-density-v2-596
Priority: 0
Service Account: default
Node: krishvoor-scale-2hfcr-worker-eastus1-dhpmh/10.0.2.168
Start Time: Wed, 27 Dec 2023 12:28:13 +0530
Labels: app=client
kube-burner-index=3
kube-burner-job=cluster-density-v2
kube-burner-runid=a5fa9b9b-8a3d-4986-9025-e4b03efcdb85
kube-burner-uuid=a24d848b-ba4e-4ee7-8b41-472f6ff881a2
name=client-1
pod-template-hash=c7d4c6df6
Annotations: k8s.ovn.org/pod-networks:
{"default":{"ip_addresses":["10.130.34.110/23"],"mac_address":"0a:58:0a:82:22:6e","gateway_ips":["10.130.34.1"],"ip_address":"10.130.34.11...
k8s.v1.cni.cncf.io/network-status:
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"10.130.34.110"
],
"mac": "0a:58:0a:82:22:6e",
"default": true,
"dns": {}
}]
openshift.io/scc: restricted-v2
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Running
IP: 10.130.34.110
IPs:
IP: 10.130.34.110
Controlled By: ReplicaSet/client-1-c7d4c6df6
Containers:
client-app:
Container ID: cri-o://4241e52f372ebc1ed9eeed4e865fa9b6a995079d0ee4b9138d65b16773a5273d
Image: quay.io/cloud-bulldozer/curl:latest
Image ID: quay.io/cloud-bulldozer/curl@sha256:4311823d3576c0b7330beccbe09896ff0378c9c1c6f6974ff9064af803fed766
Port: <none>
Host Port: <none>
Command:
sleep
inf
State: Running
Started: Wed, 27 Dec 2023 12:28:21 +0530
Ready: False
Restart Count: 0
Requests:
cpu: 10m
memory: 10Mi
Readiness: exec [/bin/sh -c curl --fail -sS ${SERVICE_ENDPOINT} -o /dev/null && curl --fail -sSk ${ROUTE_ENDPOINT} -o /dev/null] delay=0s timeout=5s period=10s #success=1 #failure=3
Environment:
ENVVAR1: kc6ykh5F7XP7fVbKHSQJJnSA3lgfsXR0ue0AskN8id0W4JkMF8iozCwPvIHamgaHUwnq4xgKKdjc9Xdw4M2AEaHCQUVwwyxtxI5yopzUsQ91iF5DH6qM5sCLWSG0qqjatzh42AjGAUR1qxojcse9X927umXkCO1pwIUOQIBvDINtSrPAcSDKrVJIEUA6tzpFCfxgYW3kKv2CPaXsMtcAeDB5ZGgjyrx5CZ0jRLPXuwP4HCUWP3srfPPfQK
ENVVAR2: xqX8MhvWT3Acp2WYUHgwsK2N9fgialvYDbWjghDQcVVinlz32l5ygQI2d9PjhCLDHWiwrYiNaikConbuicwRhDv4hhjF4YTQbqNg2y0Rlt6EoGc0AUE1PPzkd3mJJe5X9IHnc3gdFk7hIrA7p1aS8fhoOchzk7oxnBOJ0iAPtKkVupWAeC9zzmDYjEpPMfK49Ll0E9CTfc7cz5uEvN5cqwYQvE4NpAJLrQjhz7JmOBeF1bYNXtOnvsZSRx
ENVVAR3: 1iBE9wYFnMwCWeg9CcSHCrPtL5CJ2WcJyS6jPrrXHWf860Gr5jpyaEk0OuctkVkym3KtncUNMgjfC8iLls49x6DOxktqxDCuqc6Mea5p9gzRcRXOlTfnm4Yd3ILy9paYKsxCs8Kl2ipYAfpxWVRvjKic0hcPWrQWjXY4jEE8cg4PVJYjFrXskjBgptlV1B9W2gmGaB3GFuzDwwBtHpEW8EXjeEEKyJStKzzdeLkyAUoRS2YeEYEcTO6y6S
ENVVAR4: UFVArwbf7cfLH1CPtcKlNWKaoVTwW0ZG2Q78sVKTz75VpG4oBxItbnIwEKkbhwbxweWVxF2qfIwcyYTqyg2FefBRcxWwPs4Yxrheqs0uUAeewo2dGoOSW6iQTaMTKXmGDpFB17p2hWWXymwtxwedhLR56XBSW3Uyaqb3p7vnWCnYu6UVdF81ztBIyq4zA4hnWwIBcQ5HL43WxGmKr4iUF4U1Wj7OyTqD5YyzByhervnyMHOR5myFThtSPD
ROUTE_ENDPOINT: https://cluster-density-1-cluster-density-v2-596.apps.w1i1fsrv.eastus.aroapp.io/256.html
SERVICE_ENDPOINT: http://cluster-density-3/256.html
Mounts:
/configmap1 from configmap-1 (rw)
/configmap2 from configmap-2 (rw)
/configmap3 from configmap-3 (rw)
/configmap4 from configmap-4 (rw)
/etc/podlabels from podinfo (rw)
/secret1 from secret-1 (rw)
/secret2 from secret-2 (rw)
/secret3 from secret-3 (rw)
/secret4 from secret-4 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6q7tk (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
secret-1:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-density-v2-1
Optional: false
secret-2:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-density-v2-2
Optional: false
secret-3:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-density-v2-3
Optional: false
secret-4:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-density-v2-4
Optional: false
configmap-1:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cluster-density-v2-1
Optional: false
configmap-2:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cluster-density-v2-2
Optional: false
configmap-3:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cluster-density-v2-3
Optional: false
configmap-4:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cluster-density-v2-4
Optional: false
podinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
kube-api-access-6q7tk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: kubernetes.io/hostname:ScheduleAnyway when max skew 1 is exceeded for selector app=client
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned cluster-density-v2-596/client-1-c7d4c6df6-rth25 to krishvoor-scale-2hfcr-worker-eastus1-dhpmh
Normal AddedInterface 10m multus Add eth0 [10.130.34.110/23] from ovn-kubernetes
Normal Pulled 10m kubelet Container image "quay.io/cloud-bulldozer/curl:latest" already present on machine
Normal Created 10m kubelet Created container client-app
Normal Started 10m kubelet Started container client-app
Warning Unhealthy 10m kubelet Readiness probe failed: curl: (7) Failed to connect to cluster-density-3 port 80 after 2 ms: Couldn't connect to server
Warning Unhealthy 10m kubelet Readiness probe failed: curl: (7) Failed to connect to cluster-density-3 port 80 after 0 ms: Couldn't connect to server
Warning Unhealthy 10m kubelet Readiness probe failed: curl: (7) Failed to connect to cluster-density-3 port 80 after 1 ms: Couldn't connect to server
Warning Unhealthy 10m kubelet Readiness probe failed: curl: (22) The requested URL returned error: 503
Warning Unhealthy 13s (x36 over 9m53s) kubelet Readiness probe failed: command timed out
Ping test to a Infra Node:
$ oc debug node/krishvoor-scale-2hfcr-infra-aro-machinesets-eastus-2-tbb7b Starting pod/krishvoor-scale-2hfcr-infra-aro-machinesets-eastus-2-tbb7b-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.2.11 If you don't see a command prompt, try pressing enter. sh-4.4# ping 10.0.2.149 PING 10.0.2.149 (10.0.2.149) 56(84) bytes of data. 64 bytes from 10.0.2.149: icmp_seq=3 ttl=64 time=5.77 ms 64 bytes from 10.0.2.149: icmp_seq=4 ttl=64 time=8.90 ms 64 bytes from 10.0.2.149: icmp_seq=9 ttl=64 time=4.38 ms 64 bytes from 10.0.2.149: icmp_seq=12 ttl=64 time=2.78 ms ^C --- 10.0.2.149 ping statistics --- 12 packets transmitted, 4 received, 66.6667% packet loss, time 11218ms rtt min/avg/max/mdev = 2.783/5.458/8.896/2.250 ms
==================
Between workers the ping test was successful:
$ oc debug node/krishvoor-scale-2hfcr-worker-eastus3-xj2tj Starting pod/krishvoor-scale-2hfcr-worker-eastus3-xj2tj-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.2.112 If you don't see a command prompt, try pressing enter. sh-4.4# ping 10.0.2.159 PING 10.0.2.159 (10.0.2.159) 56(84) bytes of data. 64 bytes from 10.0.2.159: icmp_seq=1 ttl=64 time=2.82 ms 64 bytes from 10.0.2.159: icmp_seq=2 ttl=64 time=7.95 ms 64 bytes from 10.0.2.159: icmp_seq=3 ttl=64 time=9.74 ms 64 bytes from 10.0.2.159: icmp_seq=4 ttl=64 time=7.65 ms 64 bytes from 10.0.2.159: icmp_seq=5 ttl=64 time=0.450 ms 64 bytes from 10.0.2.159: icmp_seq=6 ttl=64 time=5.92 ms 64 bytes from 10.0.2.159: icmp_seq=7 ttl=64 time=0.538 ms 64 bytes from 10.0.2.159: icmp_seq=8 ttl=64 time=0.692 ms 64 bytes from 10.0.2.159: icmp_seq=9 ttl=64 time=0.875 ms ^C — 10.0.2.159 ping statistics — 9 packets transmitted, 9 received, 0% packet loss, time 8097ms rtt min/avg/max/mdev = 0.450/4.069/9.736/3.529 ms $
must-gather: http://perf1.perf.lab.eng.bos.redhat.com/pub/mukrishn/OCPBUGS-26530/
- is duplicated by
-
OCPBUGS-25876 [ARO] Pod Latency is very high at 252 Nodes
-
- Closed
-