-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
4.13.z
-
Important
-
No
-
False
-
Description of problem:
When running cluster-density-v2 with 2268 Iterations on a 252-node 4.13 ARO setup, the readinessProbe on SERVICE_ENDPOINT/ ROUTE_ENDPOINT starts fluctuating thus impacting the pod overall podLatency times. This podLatency increases nearly after 1800 iterations. This observation is reproducible.
ARO Instance details:
Master Type: Standard_D32s_v5
Worker Type: Standard_D8s_v5
Infra Type: Standard_E16s_v5
Enclosing some of our analysis logs:
I removed the CURL to SERVICE_ENDPOINT & it still fails on the curl to ROUTE_ENDPOINT [root@vkommadi ~]# oc describe po client-1-68cb885d6c-6zr6m Name: client-1-68cb885d6c-6zr6m Namespace: cluster-density-v3-206 Priority: 0 Service Account: default Node: krishvoor-scale-2hfcr-worker-eastus2-5rn5d/10.0.2.184 Start Time: Thu, 21 Dec 2023 13:53:19 +0000 Labels: app=client kube-burner-index=3 kube-burner-job=cluster-density-v3 kube-burner-runid=d49e025b-c618-404b-810d-443842200ad0 kube-burner-uuid=b60a4471-dfdc-4335-a21c-d728678cde4b name=client-1 pod-template-hash=68cb885d6c Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["10.131.128.247/23"],"mac_address":"0a:58:0a:83:80:f7","gateway_ips":["10.131.128.1"],"ip_address":"10.131.128... k8s.v1.cni.cncf.io/network-status: [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.131.128.247" ], "mac": "0a:58:0a:83:80:f7", "default": true, "dns": {} }] openshift.io/scc: restricted-v2 seccomp.security.alpha.kubernetes.io/pod: runtime/default Status: Running SeccompProfile: RuntimeDefault IP: 10.131.128.247 IPs: IP: 10.131.128.247 Controlled By: ReplicaSet/client-1-68cb885d6c Containers: client-app: Container ID: cri-o://67918f0700f7cb4677868e6554e4e3ddf3e05c58319b5da5144c92056621e920 Image: quay.io/cloud-bulldozer/curl:latest Image ID: quay.io/cloud-bulldozer/curl@sha256:4311823d3576c0b7330beccbe09896ff0378c9c1c6f6974ff9064af803fed766 Port: <none> Host Port: <none> Command: sleep inf State: Running Started: Thu, 21 Dec 2023 13:53:25 +0000 Ready: False Restart Count: 0 Requests: cpu: 10m memory: 10Mi Readiness: exec [/bin/sh -c curl --fail -sSk ${ROUTE_ENDPOINT} -o /dev/null] delay=0s timeout=5s period=10s #success=1 #failure=3 Environment: ENVVAR1: 2nVT83kJY9jjZOhU5TtN29WkVGlnvKNzQpQxpvTYXSBYbjvrDqvasvBMNNJOfObavXP1btEnA6zeyVQCe4gUs0xaPWfswedWo8Vvbl4Osc6oUreJQzKwcC71Kdat2UCU3o3biVjP5I4HjB2xKxQ6uuckBaM9Hqr964oX5sGZIyFSZZn4MR49vXClOue0IZhJ0cTHO1uMHScStNtZPCG996M9rVjbRPAuEKaZitEEANQzU7DJEitmBi5Wzc ENVVAR2: Jyfu0e9VV9Z5VxxnX27ysLkmPqLCuqZqTXKS4ShegYSRtYW8sjZcyQXG6lWETlNoLs4mjdAxm9zbTKBbzlkTmD6uZgi6X1B8nULQIdhjChWhDFiNcNnlpmxjCETxweqy4pOWlyuYoUT789yfjMmtT8GqmZNN6rkjZuAL2ufn77giG8dG88XXS2xvKPR0cmd37EPHeUHnitV6vnpQmAEG1AXjxXuZoBTBWkOXcUV7RdwuOd5eIKqy5MuxSs ENVVAR3: u6FOE96YFomlojCYDuyuWt8YNR295FkCCfQ0jpafvoSUHOG5XoABw0iHjKh0Yi0125gWz75MRQGK9lVWFm8Twjx6VPi5xaZ6EzuDpBtwhEBqJzsBkoUnrdwPk3mFf6Ezx5maaDLjaozTXjmFbpMOqmhKdQ44cP74mylnUX3qA09T4k0DQKH8h3aSygU2xclisJ7itEH16Z5UlG5b8NZHLSTlZR9kAGBkDlWq5HrYwgwXVlYslMdJ0a9e81 ENVVAR4: QJDNduucelZhgOxnONZbnIMsHQuETQ1JbLOqho5wUIPDUiw0tSTxO8YUcl2hO32tIGoD25b5f3shExWNZ6JGknOXhDZkxkKZdwVaT0bnyo9CHd68foUuaorQ8PzQutNIkm7dEwjlOURpMBVK7cy1VBLw5YvNWqw1f8G3k7EBhsAZXrQ1YTQI9QQTVncp5PosMJLVpreOdqMugcr2WpUwJTna8IVcUcEmLen9y6pdE3OgxPV30rZ7GTr7bh ROUTE_ENDPOINT: https://cluster-density-1-cluster-density-v2-206.apps.w1i1fsrv.eastus.aroapp.io/256.html SERVICE_ENDPOINT: http://cluster-density-1/256.html Mounts: /configmap1 from configmap-1 (rw) /configmap2 from configmap-2 (rw) /configmap3 from configmap-3 (rw) /configmap4 from configmap-4 (rw) /etc/podlabels from podinfo (rw) /secret1 from secret-1 (rw) /secret2 from secret-2 (rw) /secret3 from secret-3 (rw) /secret4 from secret-4 (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h9lrh (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: secret-1: Type: Secret (a volume populated by a Secret) SecretName: cluster-density-v3-1 Optional: false secret-2: Type: Secret (a volume populated by a Secret) SecretName: cluster-density-v3-2 Optional: false secret-3: Type: Secret (a volume populated by a Secret) SecretName: cluster-density-v3-3 Optional: false secret-4: Type: Secret (a volume populated by a Secret) SecretName: cluster-density-v3-4 Optional: false configmap-1: Type: ConfigMap (a volume populated by a ConfigMap) Name: cluster-density-v3-1 Optional: false configmap-2: Type: ConfigMap (a volume populated by a ConfigMap) Name: cluster-density-v3-2 Optional: false configmap-3: Type: ConfigMap (a volume populated by a ConfigMap) Name: cluster-density-v3-3 Optional: false configmap-4: Type: ConfigMap (a volume populated by a ConfigMap) Name: cluster-density-v3-4 Optional: false podinfo: Type: DownwardAPI (a volume populated by information about the pod) Items: metadata.labels -> labels kube-api-access-h9lrh: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Topology Spread Constraints: kubernetes.io/hostname:ScheduleAnyway when max skew 1 is exceeded for selector app=client Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 15m default-scheduler Successfully assigned cluster-density-v3-206/client-1-68cb885d6c-6zr6m to krishvoor-scale-2hfcr-worker-eastus2-5rn5d Warning FailedMount 15m (x2 over 15m) kubelet MountVolume.SetUp failed for volume "configmap-2" : configmap "cluster-density-v3-2" not found Warning FailedMount 15m kubelet MountVolume.SetUp failed for volume "secret-1" : secret "cluster-density-v3-1" not found Warning FailedMount 15m (x2 over 15m) kubelet MountVolume.SetUp failed for volume "configmap-4" : configmap "cluster-density-v3-4" not found Warning FailedMount 15m kubelet MountVolume.SetUp failed for volume "secret-3" : secret "cluster-density-v3-3" not found Warning FailedMount 15m (x2 over 15m) kubelet MountVolume.SetUp failed for volume "configmap-3" : configmap "cluster-density-v3-3" not found Warning FailedMount 15m kubelet MountVolume.SetUp failed for volume "secret-2" : secret "cluster-density-v3-2" not found Warning FailedMount 15m (x2 over 15m) kubelet MountVolume.SetUp failed for volume "configmap-1" : configmap "cluster-density-v3-1" not found Warning FailedMount 15m kubelet MountVolume.SetUp failed for volume "secret-4" : secret "cluster-density-v3-4" not found Normal AddedInterface 15m multus Add eth0 [10.131.128.247/23] from ovn-kubernetes Normal Pulled 15m kubelet Container image "quay.io/cloud-bulldozer/curl:latest" already present on machine Normal Created 15m kubelet Created container client-app Normal Started 15m kubelet Started container client-app Warning Unhealthy 13m (x8 over 14m) kubelet Readiness probe failed: curl: (22) The requested URL returned error: 503 Warning Unhealthy 1s (x55 over 14m) kubelet Readiness probe failed: command timed out [root@vkommadi ~]# curl --fail -sSk https://cluster-density-1-cluster-density-v2-206.apps.w1i1fsrv.eastus.aroapp.io/256.html curl: (22) The requested URL returned error: 503 Service Unavailable [root@vkommadi ~]# [root@vkommadi ~]# oc exec -it client-1-68cb885d6c-6zr6m bash kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. ERRO[0000] exec failed: unable to start container process: exec: "bash": executable file not found in $PATH command terminated with exit code 255 [root@vkommadi ~]# oc exec -it client-1-68cb885d6c-6zr6m sh kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. ~ $ time curl --fail -sSk https://cluster-density-1-cluster-density-v2-206.apps.w1i1fsrv.eastus.aroapp.io/256.html curl: (22) The requested URL returned error: 503 Command exited with non-zero status 22 real 0m 0.20s user 0m 0.00s sys 0m 0.00s ~ $ https://cluster-density-1-cluster-density-v2-206.apps.w1i1fsrv.eastus.aroapp.io/256.html curl: (22) The requested URL returned error: 503 ~ $
Version-Release number of selected component (if applicable):
OCP Version: 4.13.22 HAProxy Version: sh-4.4$ /usr/sbin/haproxy -v HA-Proxy version 2.2.24-26b8015 2022/05/13 - https://haproxy.org/ Status: long-term supported branch - will stop receiving fixes around Q2 2025. Known bugs: http://www.haproxy.org/bugs/bugs-2.2.24.html Running on: Linux 5.14.0-284.40.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov 1 10:30:09 EDT 2023 x86_64 sh-4.4$ Router Version: sh-4.4$ /usr/bin/openshift-router version openshift-router majorFromGit: minorFromGit: commitFromGit: 431a6e667025931c68b4f747a224af29edf356f6 versionFromGit: 4.0.0-429-g431a6e66 gitTreeState: clean buildDate: 2023-11-03T11:43:21Z sh-4.4${code} How reproducible: {code:none} Easily reproducible
Steps to Reproduce:
1. Run cluster-density-v2 with 2268 namespaces on 252 nodes 2. Check if all pods in the cluster-density-v2 are up
Actual results:
Pod's readinessprobe impacts the Pod state.
Expected results:
All the pods should be in Running state
Additional info:
OVN Tracing: CNI Add started after 4 mins of ConfigureOVS [root@vkommadi ~]# oc logs -n openshift-ovn-kubernetes ovnkube-node-pknwj -c ovnkube-node | grep "68cb885d6c-6zr6m" I1221 13:53:21.187500 3188 cni.go:265] [cluster-density-v3-206/client-1-68cb885d6c-6zr6m e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc] ADD starting CNI request [cluster-density-v3-206/client-1-68cb885d6c-6zr6m e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc] I1221 13:53:21.189502 3188 helper_linux.go:363] ConfigureOVS: namespace: cluster-density-v3-206, podName: client-1-68cb885d6c-6zr6m, network: default, NAD default, SandboxID: "e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc", UID: "d36f3b2b-0d15-4fa6-9fbd-acdf9088fd40", MAC: 0a:58:0a:83:80:f7, IPs: [10.131.128.247/23] I1221 13:53:25.018680 3188 cni.go:286] [cluster-density-v3-206/client-1-68cb885d6c-6zr6m e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc] ADD finished CNI request [cluster-density-v3-206/client-1-68cb885d6c-6zr6m e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc], result "{\"interfaces\":[{\"name\":\"e1fbb5e91647214\",\"mac\":\"3e:97:b8:40:6c:6e\"},{\"name\":\"eth0\",\"mac\":\"0a:58:0a:83:80:f7\",\"sandbox\":\"/var/run/netns/b212efc5-3235-4f09-9661-d0faf583e930\"}],\"ips\":[{\"interface\":1,\"address\":\"10.131.128.247/23\",\"gateway\":\"10.131.128.1\"}],\"dns\":{}}", err <nil> [root@vkommadi ~]# ========================================================= ========================================================= [root@vkommadi ~]# oc logs -n openshift-ovn-kubernetes ovnkube-node-pknwj -c ovn-controller | grep "68cb885d6c-6zr6m" 2023-12-21T13:53:24.848Z|01966|binding|INFO|Claiming lport cluster-density-v3-206_client-1-68cb885d6c-6zr6m for this chassis. 2023-12-21T13:53:24.848Z|01967|binding|INFO|cluster-density-v3-206_client-1-68cb885d6c-6zr6m: Claiming 0a:58:0a:83:80:f7 10.131.128.247 2023-12-21T13:53:24.954Z|01968|binding|INFO|Setting lport cluster-density-v3-206_client-1-68cb885d6c-6zr6m ovn-installed in OVS 2023-12-21T13:53:24.954Z|01969|binding|INFO|Setting lport cluster-density-v3-206_client-1-68cb885d6c-6zr6m up in Southbound [root@vkommadi ~]# ========================================================= ========================================================= krvoora ~ Desktop Git-Stuff benchmark-operator oc logs router-default-9555f79fc-6br7r | grep v3 E1221 13:43:34.545752 1 plugin.go:283] unable to find service cluster-density-v3-206/cluster-density-2: Service "cluster-density-2" not found E1221 13:43:34.545930 1 plugin.go:283] unable to find service cluster-density-v3-206/cluster-density-2: Service "cluster-density-2" not found E1221 13:43:34.575981 1 plugin.go:283] unable to find service cluster-density-v3-206/cluster-density-2: Service "cluster-density-2" not found E1221 13:43:34.577266 1 plugin.go:283] unable to find service cluster-density-v3-206/cluster-density-2: Service "cluster-density-2" not found E1221 13:44:04.585319 1 plugin.go:283] unable to find service cluster-density-v3-35/cluster-density-1: Service "cluster-density-1" not found E1221 13:44:04.585456 1 plugin.go:283] unable to find service cluster-density-v3-35/cluster-density-1: Service "cluster-density-1" not found krvoora ~ Desktop Git-Stuff benchmark-operator ============================================================================
- duplicates
-
OCPBUGS-26530 [ARO] PodLatencies are impacted during c.d.v2 at 252 Nodes
- Closed