-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
4.13.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
No
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When running cluster-density-v2 with 2268 Iterations on a 252-node 4.13 ARO setup, the readinessProbe on SERVICE_ENDPOINT/ ROUTE_ENDPOINT starts fluctuating thus impacting the pod overall podLatency times. This podLatency increases nearly after 1800 iterations. This observation is reproducible.
ARO Instance details:
Master Type: Standard_D32s_v5
Worker Type: Standard_D8s_v5
Infra Type: Standard_E16s_v5
Enclosing some of our analysis logs:
I removed the CURL to SERVICE_ENDPOINT & it still fails on the curl to ROUTE_ENDPOINT
[root@vkommadi ~]# oc describe po client-1-68cb885d6c-6zr6m
Name: client-1-68cb885d6c-6zr6m
Namespace: cluster-density-v3-206
Priority: 0
Service Account: default
Node: krishvoor-scale-2hfcr-worker-eastus2-5rn5d/10.0.2.184
Start Time: Thu, 21 Dec 2023 13:53:19 +0000
Labels: app=client
kube-burner-index=3
kube-burner-job=cluster-density-v3
kube-burner-runid=d49e025b-c618-404b-810d-443842200ad0
kube-burner-uuid=b60a4471-dfdc-4335-a21c-d728678cde4b
name=client-1
pod-template-hash=68cb885d6c
Annotations: k8s.ovn.org/pod-networks:
{"default":{"ip_addresses":["10.131.128.247/23"],"mac_address":"0a:58:0a:83:80:f7","gateway_ips":["10.131.128.1"],"ip_address":"10.131.128...
k8s.v1.cni.cncf.io/network-status:
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"10.131.128.247"
],
"mac": "0a:58:0a:83:80:f7",
"default": true,
"dns": {}
}]
openshift.io/scc: restricted-v2
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Running
SeccompProfile: RuntimeDefault
IP: 10.131.128.247
IPs:
IP: 10.131.128.247
Controlled By: ReplicaSet/client-1-68cb885d6c
Containers:
client-app:
Container ID: cri-o://67918f0700f7cb4677868e6554e4e3ddf3e05c58319b5da5144c92056621e920
Image: quay.io/cloud-bulldozer/curl:latest
Image ID: quay.io/cloud-bulldozer/curl@sha256:4311823d3576c0b7330beccbe09896ff0378c9c1c6f6974ff9064af803fed766
Port: <none>
Host Port: <none>
Command:
sleep
inf
State: Running
Started: Thu, 21 Dec 2023 13:53:25 +0000
Ready: False
Restart Count: 0
Requests:
cpu: 10m
memory: 10Mi
Readiness: exec [/bin/sh -c curl --fail -sSk ${ROUTE_ENDPOINT} -o /dev/null] delay=0s timeout=5s period=10s #success=1 #failure=3
Environment:
ENVVAR1: 2nVT83kJY9jjZOhU5TtN29WkVGlnvKNzQpQxpvTYXSBYbjvrDqvasvBMNNJOfObavXP1btEnA6zeyVQCe4gUs0xaPWfswedWo8Vvbl4Osc6oUreJQzKwcC71Kdat2UCU3o3biVjP5I4HjB2xKxQ6uuckBaM9Hqr964oX5sGZIyFSZZn4MR49vXClOue0IZhJ0cTHO1uMHScStNtZPCG996M9rVjbRPAuEKaZitEEANQzU7DJEitmBi5Wzc
ENVVAR2: Jyfu0e9VV9Z5VxxnX27ysLkmPqLCuqZqTXKS4ShegYSRtYW8sjZcyQXG6lWETlNoLs4mjdAxm9zbTKBbzlkTmD6uZgi6X1B8nULQIdhjChWhDFiNcNnlpmxjCETxweqy4pOWlyuYoUT789yfjMmtT8GqmZNN6rkjZuAL2ufn77giG8dG88XXS2xvKPR0cmd37EPHeUHnitV6vnpQmAEG1AXjxXuZoBTBWkOXcUV7RdwuOd5eIKqy5MuxSs
ENVVAR3: u6FOE96YFomlojCYDuyuWt8YNR295FkCCfQ0jpafvoSUHOG5XoABw0iHjKh0Yi0125gWz75MRQGK9lVWFm8Twjx6VPi5xaZ6EzuDpBtwhEBqJzsBkoUnrdwPk3mFf6Ezx5maaDLjaozTXjmFbpMOqmhKdQ44cP74mylnUX3qA09T4k0DQKH8h3aSygU2xclisJ7itEH16Z5UlG5b8NZHLSTlZR9kAGBkDlWq5HrYwgwXVlYslMdJ0a9e81
ENVVAR4: QJDNduucelZhgOxnONZbnIMsHQuETQ1JbLOqho5wUIPDUiw0tSTxO8YUcl2hO32tIGoD25b5f3shExWNZ6JGknOXhDZkxkKZdwVaT0bnyo9CHd68foUuaorQ8PzQutNIkm7dEwjlOURpMBVK7cy1VBLw5YvNWqw1f8G3k7EBhsAZXrQ1YTQI9QQTVncp5PosMJLVpreOdqMugcr2WpUwJTna8IVcUcEmLen9y6pdE3OgxPV30rZ7GTr7bh
ROUTE_ENDPOINT: https://cluster-density-1-cluster-density-v2-206.apps.w1i1fsrv.eastus.aroapp.io/256.html
SERVICE_ENDPOINT: http://cluster-density-1/256.html
Mounts:
/configmap1 from configmap-1 (rw)
/configmap2 from configmap-2 (rw)
/configmap3 from configmap-3 (rw)
/configmap4 from configmap-4 (rw)
/etc/podlabels from podinfo (rw)
/secret1 from secret-1 (rw)
/secret2 from secret-2 (rw)
/secret3 from secret-3 (rw)
/secret4 from secret-4 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h9lrh (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
secret-1:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-density-v3-1
Optional: false
secret-2:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-density-v3-2
Optional: false
secret-3:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-density-v3-3
Optional: false
secret-4:
Type: Secret (a volume populated by a Secret)
SecretName: cluster-density-v3-4
Optional: false
configmap-1:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cluster-density-v3-1
Optional: false
configmap-2:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cluster-density-v3-2
Optional: false
configmap-3:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cluster-density-v3-3
Optional: false
configmap-4:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cluster-density-v3-4
Optional: false
podinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
kube-api-access-h9lrh:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: kubernetes.io/hostname:ScheduleAnyway when max skew 1 is exceeded for selector app=client
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15m default-scheduler Successfully assigned cluster-density-v3-206/client-1-68cb885d6c-6zr6m to krishvoor-scale-2hfcr-worker-eastus2-5rn5d
Warning FailedMount 15m (x2 over 15m) kubelet MountVolume.SetUp failed for volume "configmap-2" : configmap "cluster-density-v3-2" not found
Warning FailedMount 15m kubelet MountVolume.SetUp failed for volume "secret-1" : secret "cluster-density-v3-1" not found
Warning FailedMount 15m (x2 over 15m) kubelet MountVolume.SetUp failed for volume "configmap-4" : configmap "cluster-density-v3-4" not found
Warning FailedMount 15m kubelet MountVolume.SetUp failed for volume "secret-3" : secret "cluster-density-v3-3" not found
Warning FailedMount 15m (x2 over 15m) kubelet MountVolume.SetUp failed for volume "configmap-3" : configmap "cluster-density-v3-3" not found
Warning FailedMount 15m kubelet MountVolume.SetUp failed for volume "secret-2" : secret "cluster-density-v3-2" not found
Warning FailedMount 15m (x2 over 15m) kubelet MountVolume.SetUp failed for volume "configmap-1" : configmap "cluster-density-v3-1" not found
Warning FailedMount 15m kubelet MountVolume.SetUp failed for volume "secret-4" : secret "cluster-density-v3-4" not found
Normal AddedInterface 15m multus Add eth0 [10.131.128.247/23] from ovn-kubernetes
Normal Pulled 15m kubelet Container image "quay.io/cloud-bulldozer/curl:latest" already present on machine
Normal Created 15m kubelet Created container client-app
Normal Started 15m kubelet Started container client-app
Warning Unhealthy 13m (x8 over 14m) kubelet Readiness probe failed: curl: (22) The requested URL returned error: 503
Warning Unhealthy 1s (x55 over 14m) kubelet Readiness probe failed: command timed out
[root@vkommadi ~]# curl --fail -sSk https://cluster-density-1-cluster-density-v2-206.apps.w1i1fsrv.eastus.aroapp.io/256.html
curl: (22) The requested URL returned error: 503 Service Unavailable
[root@vkommadi ~]#
[root@vkommadi ~]# oc exec -it client-1-68cb885d6c-6zr6m bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
ERRO[0000] exec failed: unable to start container process: exec: "bash": executable file not found in $PATH
command terminated with exit code 255
[root@vkommadi ~]# oc exec -it client-1-68cb885d6c-6zr6m sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
~ $ time curl --fail -sSk https://cluster-density-1-cluster-density-v2-206.apps.w1i1fsrv.eastus.aroapp.io/256.html
curl: (22) The requested URL returned error: 503
Command exited with non-zero status 22
real 0m 0.20s
user 0m 0.00s
sys 0m 0.00s
~ $ https://cluster-density-1-cluster-density-v2-206.apps.w1i1fsrv.eastus.aroapp.io/256.html
curl: (22) The requested URL returned error: 503
~ $
Version-Release number of selected component (if applicable):
OCP Version: 4.13.22
HAProxy Version:
sh-4.4$ /usr/sbin/haproxy -v
HA-Proxy version 2.2.24-26b8015 2022/05/13 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2025.
Known bugs: http://www.haproxy.org/bugs/bugs-2.2.24.html
Running on: Linux 5.14.0-284.40.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov 1 10:30:09 EDT 2023 x86_64
sh-4.4$
Router Version:
sh-4.4$ /usr/bin/openshift-router version
openshift-router
majorFromGit:
minorFromGit:
commitFromGit: 431a6e667025931c68b4f747a224af29edf356f6
versionFromGit: 4.0.0-429-g431a6e66
gitTreeState: clean
buildDate: 2023-11-03T11:43:21Z
sh-4.4${code}
How reproducible:
{code:none}
Easily reproducible
Steps to Reproduce:
1. Run cluster-density-v2 with 2268 namespaces on 252 nodes
2. Check if all pods in the cluster-density-v2 are up
Actual results:
Pod's readinessprobe impacts the Pod state.
Expected results:
All the pods should be in Running state
Additional info:
OVN Tracing:
CNI Add started after 4 mins of ConfigureOVS
[root@vkommadi ~]# oc logs -n openshift-ovn-kubernetes ovnkube-node-pknwj -c ovnkube-node | grep "68cb885d6c-6zr6m"
I1221 13:53:21.187500 3188 cni.go:265] [cluster-density-v3-206/client-1-68cb885d6c-6zr6m e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc] ADD starting CNI request [cluster-density-v3-206/client-1-68cb885d6c-6zr6m e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc]
I1221 13:53:21.189502 3188 helper_linux.go:363] ConfigureOVS: namespace: cluster-density-v3-206, podName: client-1-68cb885d6c-6zr6m, network: default, NAD default, SandboxID: "e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc", UID: "d36f3b2b-0d15-4fa6-9fbd-acdf9088fd40", MAC: 0a:58:0a:83:80:f7, IPs: [10.131.128.247/23]
I1221 13:53:25.018680 3188 cni.go:286] [cluster-density-v3-206/client-1-68cb885d6c-6zr6m e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc] ADD finished CNI request [cluster-density-v3-206/client-1-68cb885d6c-6zr6m e1fbb5e9164721427064a79cd8b3630ff50e2097f75d24cd52ad553f851462fc], result "{\"interfaces\":[{\"name\":\"e1fbb5e91647214\",\"mac\":\"3e:97:b8:40:6c:6e\"},{\"name\":\"eth0\",\"mac\":\"0a:58:0a:83:80:f7\",\"sandbox\":\"/var/run/netns/b212efc5-3235-4f09-9661-d0faf583e930\"}],\"ips\":[{\"interface\":1,\"address\":\"10.131.128.247/23\",\"gateway\":\"10.131.128.1\"}],\"dns\":{}}", err <nil>
[root@vkommadi ~]#
=========================================================
=========================================================
[root@vkommadi ~]# oc logs -n openshift-ovn-kubernetes ovnkube-node-pknwj -c ovn-controller | grep "68cb885d6c-6zr6m"
2023-12-21T13:53:24.848Z|01966|binding|INFO|Claiming lport cluster-density-v3-206_client-1-68cb885d6c-6zr6m for this chassis.
2023-12-21T13:53:24.848Z|01967|binding|INFO|cluster-density-v3-206_client-1-68cb885d6c-6zr6m: Claiming 0a:58:0a:83:80:f7 10.131.128.247
2023-12-21T13:53:24.954Z|01968|binding|INFO|Setting lport cluster-density-v3-206_client-1-68cb885d6c-6zr6m ovn-installed in OVS
2023-12-21T13:53:24.954Z|01969|binding|INFO|Setting lport cluster-density-v3-206_client-1-68cb885d6c-6zr6m up in Southbound
[root@vkommadi ~]#
=========================================================
=========================================================
krvoora ~ Desktop Git-Stuff benchmark-operator oc logs router-default-9555f79fc-6br7r | grep v3
E1221 13:43:34.545752 1 plugin.go:283] unable to find service cluster-density-v3-206/cluster-density-2: Service "cluster-density-2" not found
E1221 13:43:34.545930 1 plugin.go:283] unable to find service cluster-density-v3-206/cluster-density-2: Service "cluster-density-2" not found
E1221 13:43:34.575981 1 plugin.go:283] unable to find service cluster-density-v3-206/cluster-density-2: Service "cluster-density-2" not found
E1221 13:43:34.577266 1 plugin.go:283] unable to find service cluster-density-v3-206/cluster-density-2: Service "cluster-density-2" not found
E1221 13:44:04.585319 1 plugin.go:283] unable to find service cluster-density-v3-35/cluster-density-1: Service "cluster-density-1" not found
E1221 13:44:04.585456 1 plugin.go:283] unable to find service cluster-density-v3-35/cluster-density-1: Service "cluster-density-1" not found
krvoora ~ Desktop Git-Stuff benchmark-operator
============================================================================
- duplicates
-
OCPBUGS-26530 [ARO] PodLatencies are impacted during c.d.v2 at 252 Nodes
-
- Closed
-