Loading...

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 4.12
Component/s: Networking / router
Labels:
- test-blocker

Regression:
None
Sprint:
Sprint 226
sprint_count:
1
Release Blocker:
Approved
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

On an AWS SDN cluster with 27 worker nodes, and router pods on separate INFRA nodes, run router-perf test to create 2k pods/routes/services and rollout routers, after that all routes can't be reached.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2022-10-18-192348
HA-Proxy version 2.2.24-26b8015 2022/05/13 - https://haproxy.org/

How reproducible:

First time see this in 4.12 test. I will update when I do more tests.
Similar issue reported in 4.10 https://bugzilla.redhat.com/show_bug.cgi?id=2035481
Not reproduce on a AWS cluster(SDN network) with 3 masters and 3 workers with m6i.xlarge type and 400 pods/services/routes.
Successful job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/router-perf/710/console

Steps to Reproduce:

1. Install a AWS cluster(SDN network) with vm_type_masters: m5.4xlarge
vm_type_workers: m5.2xlarge
Install job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/147668/
2. Scaleup the worker machinesets to 27 nodes, install INFRA machinesets with 3 nodes, and WORKLOAD machinests with 1 node. The INFRA nodes will get the router pods running on it. The WORKLOAD node will run the test workload as client.
Scale up and install INFRA and WORKLOAD machinesets job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/cluster-workers-scaling/1570/ 
3. Run router-perf to load the cluster with 2k pods/services/routes. 
Test job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/router-perf/713/console 
4. Test the routes after the test resources were created successfully

Actual results:

After router-perf test created the 2k pods/services/routes, the routes can't be accessed. All routes including console can't be accessed.

Expected results:

All routes should be accessible

Additional info:

Must-gather: http://file.nay.redhat.com/~qili/OCPBUGS-2557/must-gather.local.233924781796810649.tar.gz

Router-perf test: https://github.com/cloud-bulldozer/e2e-benchmarking/blob/master/workloads/router-perf-v2/ingress-performance.sh
   
#Check after the test finished creating 2k pods/services/routes and rolled out the ingress pods


#All resources are successfully created


% for termination in http edge passthrough reencrypt; do echo pods in http-scale-${termination}; oc get pods -n http-scale-${termination}| grep Running| wc -l; echo services in http-scale-${termination}; oc get services --no-headers -n http-scale-${termination} | wc -l; echo endpoints in http-scale-${termination}; oc get endpoints --no-headers -n http-scale-${termination} | wc -l; done
pods in http-scale-http
     500
services in http-scale-http
     500
endpoints in http-scale-http
     500
pods in http-scale-edge
     500
services in http-scale-edge
     500
endpoints in http-scale-edge
     500
pods in http-scale-passthrough
     500
services in http-scale-passthrough
     500
endpoints in http-scale-passthrough
     500
pods in http-scale-reencrypt
     500
services in http-scale-reencrypt
     500
endpoints in http-scale-reencrypt
     500


# Test routes are not working


% oc get routes -n http-scale-http | head -n 2 
NAME            HOST/PORT                                                                    PATH   SERVICES        PORT   TERMINATION   WILDCARD
http-perf-1     http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com            http-perf-1     http                 None


% curl -I http://http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com -v
*   Trying 3.136.63.97:80...
* Connected to http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com (3.136.63.97) port 80 (#0)
> HEAD / HTTP/1.1
> Host: http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server


# All other routes are not working


% oc get co --no-headers| grep -v 'True.*False.*False'
authentication                             4.12.0-0.nightly-2022-10-18-192348   False   False   True    122m    OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.qili-awsbig.qe.devcluster.openshift.com/healthz": EOF
console                                    4.12.0-0.nightly-2022-10-18-192348   False   False   False   122m    RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.qili-awsbig.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.qili-awsbig.qe.devcluster.openshift.com": EOF
==================================
# Ingress pods info


% oc get po -n openshift-ingress -o wide
NAME                             READY   STATUS    RESTARTS   AGE    IP             NODE                                         NOMINATED NODE   READINESS GATES
router-default-f6646c495-d2tc7   1/1     Running   0          153m   10.131.14.9    ip-10-0-204-218.us-east-2.compute.internal   <none>           <none>
router-default-f6646c495-ks64z   1/1     Running   0          153m   10.129.16.10   ip-10-0-174-104.us-east-2.compute.internal   <none>           <none>


# curl the route from inside the infress pod failed


% oc exec -it -n openshift-ingress router-default-f6646c495-ks64z -- curl --resolve http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com:80:localhost http://http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com -I
curl: (52) Empty reply from server
command terminated with exit code 52


# Found the route info in /var/lib/haproxy/conf/haproxy.config
oc exec -it -n openshift-ingress router-default-f6646c495-ks64z -- bash
cat /var/lib/haproxy/conf/haproxy.config | grep ...


# Plain http backend or backend with TLS terminated at the edge or a
# secure backend with re-encryption.
backend be_http:http-scale-http:http-perf-1
  mode http
  option redispatch
  option forwardfor
  balance random


  timeout check 5000ms
  http-request add-header X-Forwarded-Host %[req.hdr(host)]
  http-request add-header X-Forwarded-Port %[dst_port]
  http-request add-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request add-header X-Forwarded-Proto https if { ssl_fc }
  http-request add-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
  http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)]
  cookie 8707abd198467b89ecf3f7e5402fb688 insert indirect nocache httponly
  server pod:http-perf-1-78575cc4bb-j8qhn:http-perf-1:http:10.128.12.7:8080 10.128.12.7:8080 cookie fff446f90347f39d23317bbcf76dc8a1 weight 1


# Describe the service
% oc describe service http-perf-1  -n http-scale-http 
Name:                     http-perf-1
Namespace:                http-scale-http
Labels:                   app=http-perf
                          kube-burner-index=1
                          kube-burner-job=http-scale-http
                          kube-burner-uuid=f89fbbae-0c23-4a78-a67b-b056423e6456
Annotations:              <none>
Selector:                 app=nginx-1
Type:                     NodePort
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       172.30.35.211
IPs:                      172.30.35.211
Port:                     http  8080/TCP
TargetPort:               8080/TCP
NodePort:                 http  32577/TCP
Endpoints:                10.128.12.7:8080
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>




======================
# Debug inside the router pod


# pod 80 and 443 can not response


sh-4.4$ curl 10.129.16.10
curl: (56) Recv failure: Connection reset by peer
bash-4.4$ curl http://10.131.14.11:80
curl: (56) Recv failure: Connection reset by peer
sh-4.4$ curl https://10.129.16.10:443
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 10.129.16.10:443 


# Yet port 1936 responds:


sh-4.4$ curl 10.129.16.10:1936
Forbidden: 


# We appear to be listening on port 443 and 80:


sh-4.4$ ss -l -n 
Netid        State         Recv-Q        Send-Q                                                 Local Address:Port                 Peer Address:Port       Process       
nl           UNCONN        0             0                                                                  0:0                                *                         
nl           UNCONN        0             0                                                                  4:0                                *                         
nl           UNCONN        0             0                                                                  6:0                                *                         
nl           UNCONN        0             0                                                                  9:0                                *                         
nl           UNCONN        0             0                                                                 10:0                                *                         
nl           UNCONN        0             0                                                                 12:0                                *                         
nl           UNCONN        0             0                                                                 15:0                                *                         
nl           UNCONN        0             0                                                                 16:0                                *                         
u_str        LISTEN        0             0                           /var/lib/haproxy/run/haproxy.sock.56.tmp 200473                          * 0                        
u_str        LISTEN        0             0                       /var/lib/haproxy/run/haproxy-sni.sock.56.tmp 200478                          * 0                        
u_str        LISTEN        0             0                    /var/lib/haproxy/run/haproxy-no-sni.sock.56.tmp 200479                          * 0                        
u_dgr        UNCONN        0             0                                                             @00014 154486                          * 0                        
u_dgr        UNCONN        0             0                                                                  * 741881                          * 0                        
u_dgr        UNCONN        0             0                                                                  * 741882                          * 0                        
tcp          LISTEN        0             0                                                            0.0.0.0:10081                     0.0.0.0:*                        
tcp          LISTEN        0             0                                                            0.0.0.0:80                        0.0.0.0:*                        
tcp          LISTEN        0             0                                                            0.0.0.0:443                       0.0.0.0:*                        
tcp          LISTEN        0             0                                                                  *:1936                            *:*    


# If I run the following to stand up another HTTP server:


% oc rsh -n openshift-ingress router-default-f6646c495-ks64z socat -T 1 -d -d tcp-l:10081,reuseaddr,fork,crlf system:"echo -e \"\\\"HTTP/1.0 200 OK\\\nDocumentType: text/html\\\n\\\n<html>date: \$\(date\)<br>server:\$SOCAT_SOCKADDR:\$SOCAT_SOCKPORT<br>client: \$SOCAT_PEERADDR:\$SOCAT_PEERPORT\\\n<pre>\\\"\"; cat; echo -e \"\\\"\\\n</pre></html>\\\"\""
2022/10/19 07:01:49 socat[129] W ioctl(5, IOCTL_VM_SOCKETS_GET_LOCAL_CID, ...): Inappropriate ioctl for device
2022/10/19 07:01:49 socat[129] N listening on AF=2 0.0.0.0:10081


# And curl that new endpoint:


% oc rsh -n openshift-ingress router-default-f6646c495-ks64z
sh-4.4$ curl http://10.129.16.10:10081
HTTP/1.0 200 OK
DocumentType: text/html


# Let's bind to port 80 instead of 10081


% oc rsh -n openshift-ingress router-default-f6646c495-ks64z socat -T 1 -d -d tcp-l:80,reuseaddr,fork,crlf system:"echo -e \"\\\"HTTP/1.0 200 OK\\\nDocumentType: text/html\\\n\\\n<html>date: \$\(date\)<br>server:\$SOCAT_SOCKADDR:\$SOCAT_SOCKPORT<br>client: \$SOCAT_PEERADDR:\$SOCAT_PEERPORT\\\n<pre>\\\"\"; cat; echo -e \"\\\"\\\n</pre></html>\\\"\""
2022/10/19 07:11:42 socat[149] W ioctl(5, IOCTL_VM_SOCKETS_GET_LOCAL_CID, ...): Inappropriate ioctl for device
2022/10/19 07:11:42 socat[149] E bind(5, {AF=2 0.0.0.0:80}, 16): Permission denied
2022/10/19 07:11:42 socat[149] N exit(1)
command terminated with exit code 1


# Was really expecting "port already in use".


# Trying again on port 81


% oc rsh -n openshift-ingress router-default-f6646c495-ks64z socat -T 1 -d -d tcp-l:81,reuseaddr,fork,crlf system:"echo -e \"\\\"HTTP/1.0 200 OK\\\nDocumentType: text/html\\\n\\\n<html>date: \$\(date\)<br>server:\$SOCAT_SOCKADDR:\$SOCAT_SOCKPORT<br>client: \$SOCAT_PEERADDR:\$SOCAT_PEERPORT\\\n<pre>\\\"\"; cat; echo -e \"\\\"\\\n</pre></html>\\\"\""
2022/10/19 07:12:33 socat[155] W ioctl(5, IOCTL_VM_SOCKETS_GET_LOCAL_CID, ...): Inappropriate ioctl for device
2022/10/19 07:12:33 socat[155] E bind(5, {AF=2 0.0.0.0:81}, 16): Permission denied
2022/10/19 07:12:33 socat[155] N exit(1)
command terminated with exit code 1


# Cannot bind to low port numbers


# Logs from pods


% oc get pods -n openshift-ingress
NAME                             READY   STATUS    RESTARTS   AGE
router-default-f6646c495-d2tc7   1/1     Running   0          162m
router-default-f6646c495-ks64z   1/1     Running   0          162m


% oc logs -f -n openshift-ingress router-default-f6646c495-d2tc7 
I1019 04:31:27.847704       1 template.go:437] router "msg"="starting router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: 0493edbf2c0eec739be014ce6032f6b1941b8487\nversionFromGit: 4.0.0-402-g0493edbf\ngitTreeState: clean\nbuildDate: 2022-10-14T21:32:53Z\n"
I1019 04:31:27.848805       1 metrics.go:156] metrics "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936"
I1019 04:31:27.852172       1 router.go:191] template "msg"="creating a new template router" "writeDir"="/var/lib/haproxy"
I1019 04:31:27.852214       1 router.go:273] template "msg"="router will coalesce reloads within an interval of each other" "interval"="5s"
I1019 04:31:27.852427       1 router.go:343] template "msg"="watching for changes" "path"="/etc/pki/tls/private"
I1019 04:31:27.852476       1 router.go:269] router "msg"="router is including routes in all namespaces" 
E1019 04:31:28.620828       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I1019 04:31:28.735869       1 healthz.go:257] backend-proxy-http check failed: healthz
[-]backend-proxy-http failed: dial tcp [::1]:80: connect: connection refused
I1019 04:31:29.081209       1 router.go:618] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 using PROXY protocol ...\n - Health check ok : 0 retry attempt(s).\n"
I1019 04:31:34.220802       1 router.go:618] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 using PROXY protocol ...\n - Health check ok : 0 retry attempt(s).\n"
^C


% oc logs -f -n openshift-ingress router-default-f6646c495-ks64z 
I1019 04:30:18.686510       1 template.go:437] router "msg"="starting router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: 0493edbf2c0eec739be014ce6032f6b1941b8487\nversionFromGit: 4.0.0-402-g0493edbf\ngitTreeState: clean\nbuildDate: 2022-10-14T21:32:53Z\n"
I1019 04:30:18.688721       1 metrics.go:156] metrics "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936"
I1019 04:30:18.692149       1 router.go:191] template "msg"="creating a new template router" "writeDir"="/var/lib/haproxy"
I1019 04:30:18.692193       1 router.go:273] template "msg"="router will coalesce reloads within an interval of each other" "interval"="5s"
I1019 04:30:18.692415       1 router.go:343] template "msg"="watching for changes" "path"="/etc/pki/tls/private"
I1019 04:30:18.692455       1 router.go:269] router "msg"="router is including routes in all namespaces" 
E1019 04:30:19.469011       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I1019 04:30:19.469163       1 healthz.go:257] backend-proxy-http check failed: healthz
[-]backend-proxy-http failed: dial tcp [::1]:80: connect: connection refused
I1019 04:30:19.948598       1 router.go:618] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 using PROXY protocol ...\n - Health check ok : 0 retry attempt(s).\n"
I1019 04:30:24.958710       1 router.go:618] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 using PROXY protocol ...\n - Health check ok : 0 retry attempt(s).\n"
2022/10/19 07:05:38 http: TLS handshake error from 10.129.16.10:53964: local error: tls: bad record MAC




=================
# The problem kept for 3 hours+


% oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.12.0-0.nightly-2022-10-18-192348   False       False         True       3h2m    OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.qili-awsbig.qe.devcluster.openshift.com/healthz": EOF
baremetal                                  4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h44m   
cloud-controller-manager                   4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h46m   
cloud-credential                           4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h47m   
cluster-autoscaler                         4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h44m   
config-operator                            4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h45m   
console                                    4.12.0-0.nightly-2022-10-18-192348   False       False         False      3h2m    RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.qili-awsbig.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.qili-awsbig.qe.devcluster.openshift.com": EOF
control-plane-machine-set                  4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h44m   
csi-snapshot-controller                    4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h45m   
dns                                        4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h45m   
etcd                                       4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h43m   
image-registry                             4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h36m   
ingress                                    4.12.0-0.nightly-2022-10-18-192348   True        False         True       5h36m   The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
....


# Delete the ingress pods


% oc get po -n openshift-ingress
NAME                             READY   STATUS    RESTARTS   AGE
router-default-f6646c495-d2tc7   1/1     Running   0          176m
router-default-f6646c495-ks64z   1/1     Running   0          176m
qili@qili-mac Oct19 % oc delete po -n openshift-ingress router-default-f6646c495-d2tc7 
pod "router-default-f6646c495-d2tc7" deleted


 % oc get po -n openshift-ingress
NAME                             READY   STATUS    RESTARTS   AGE
router-default-f6646c495-5l2kp   1/1     Running   0          112s
router-default-f6646c495-llvxx   1/1     Running   0          22s


# That's not helpful


% curl -I http://http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com -v
*   Trying 18.223.67.40:80...
* Connected to http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com (18.223.67.40) port 80 (#0)
> HEAD / HTTP/1.1
> Host: http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server




# Delete all test resources
% for termination in http edge passthrough reencrypt; do oc delete ns http-scale-${termination}; done
namespace "http-scale-http" deleted
namespace "http-scale-edge" deleted
namespace "http-scale-passthrough" deleted
namespace "http-scale-reencrypt" deleted


# That's not helpful too 


% curl -I http://http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com -v
*   Trying 3.136.63.97:80...
* Connected to http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com (3.136.63.97) port 80 (#0)
> HEAD / HTTP/1.1
> Host: http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server


==================
# Other general info
# Nodes are good
% oc get nodes
NAME                                         STATUS   ROLES                  AGE     VERSION
ip-10-0-129-166.us-east-2.compute.internal   Ready    workload               3h11m   v1.25.2+5bf2e1f
ip-10-0-133-87.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-136-149.us-east-2.compute.internal   Ready    infra                  3h11m   v1.25.2+5bf2e1f
ip-10-0-140-227.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-147-123.us-east-2.compute.internal   Ready    worker                 4h59m   v1.25.2+5bf2e1f
ip-10-0-147-236.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-150-216.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-153-12.us-east-2.compute.internal    Ready    control-plane,master   5h51m   v1.25.2+5bf2e1f
ip-10-0-155-172.us-east-2.compute.internal   Ready    worker                 4h59m   v1.25.2+5bf2e1f
ip-10-0-156-117.us-east-2.compute.internal   Ready    worker                 4h59m   v1.25.2+5bf2e1f
ip-10-0-156-64.us-east-2.compute.internal    Ready    worker                 5h46m   v1.25.2+5bf2e1f
ip-10-0-159-84.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-162-10.us-east-2.compute.internal    Ready    control-plane,master   5h52m   v1.25.2+5bf2e1f
ip-10-0-164-52.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-169-86.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-174-104.us-east-2.compute.internal   Ready    infra                  3h10m   v1.25.2+5bf2e1f
ip-10-0-175-143.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-176-254.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-181-242.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-181-79.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-183-145.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-184-49.us-east-2.compute.internal    Ready    worker                 5h41m   v1.25.2+5bf2e1f
ip-10-0-190-1.us-east-2.compute.internal     Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-198-193.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-198-255.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-198-41.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-199-41.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-199-69.us-east-2.compute.internal    Ready    worker                 5h47m   v1.25.2+5bf2e1f
ip-10-0-204-114.us-east-2.compute.internal   Ready    control-plane,master   5h52m   v1.25.2+5bf2e1f
ip-10-0-204-218.us-east-2.compute.internal   Ready    infra                  3h11m   v1.25.2+5bf2e1f
ip-10-0-206-24.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-207-187.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-219-135.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
ip-10-0-220-118.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f


# Pods are good
% oc get pods --no-headers -A| egrep -v 'Running|Completed'


# Node resource usage
% oc adm top nodes
NAME                                         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
ip-10-0-129-166.us-east-2.compute.internal   94m          0%     3136Mi          2%        
ip-10-0-133-87.us-east-2.compute.internal    112m         1%     3307Mi          10%       
ip-10-0-136-149.us-east-2.compute.internal   183m         0%     4127Mi          2%        
ip-10-0-140-227.us-east-2.compute.internal   155m         2%     3216Mi          10%       
ip-10-0-147-123.us-east-2.compute.internal   115m         1%     2929Mi          9%        
ip-10-0-147-236.us-east-2.compute.internal   348m         4%     9452Mi          30%       
ip-10-0-150-216.us-east-2.compute.internal   207m         2%     4210Mi          13%       
ip-10-0-153-12.us-east-2.compute.internal    612m         3%     10205Mi         16%       
ip-10-0-155-172.us-east-2.compute.internal   161m         2%     3095Mi          10%       
ip-10-0-156-117.us-east-2.compute.internal   162m         2%     3737Mi          12%       
ip-10-0-156-64.us-east-2.compute.internal    191m         2%     5533Mi          18%       
ip-10-0-159-84.us-east-2.compute.internal    138m         1%     4180Mi          13%       
ip-10-0-162-10.us-east-2.compute.internal    495m         3%     9895Mi          15%       
ip-10-0-164-52.us-east-2.compute.internal    196m         2%     3801Mi          12%       
ip-10-0-169-86.us-east-2.compute.internal    170m         2%     4221Mi          13%       
ip-10-0-174-104.us-east-2.compute.internal   199m         0%     3908Mi          2%        
ip-10-0-175-143.us-east-2.compute.internal   141m         1%     2849Mi          9%        
ip-10-0-176-254.us-east-2.compute.internal   252m         3%     4730Mi          15%       
ip-10-0-181-242.us-east-2.compute.internal   142m         1%     3085Mi          10%       
ip-10-0-181-79.us-east-2.compute.internal    174m         2%     4136Mi          13%       
ip-10-0-183-145.us-east-2.compute.internal   116m         1%     3731Mi          12%       
ip-10-0-184-49.us-east-2.compute.internal    236m         3%     5455Mi          17%       
ip-10-0-190-1.us-east-2.compute.internal     193m         2%     3609Mi          11%       
ip-10-0-198-193.us-east-2.compute.internal   146m         1%     3262Mi          10%       
ip-10-0-198-255.us-east-2.compute.internal   180m         2%     4799Mi          15%       
ip-10-0-198-41.us-east-2.compute.internal    163m         2%     5300Mi          17%       
ip-10-0-199-41.us-east-2.compute.internal    224m         2%     4702Mi          15%       
ip-10-0-199-69.us-east-2.compute.internal    269m         3%     5568Mi          18%       
ip-10-0-204-114.us-east-2.compute.internal   853m         5%     12709Mi         20%       
ip-10-0-204-218.us-east-2.compute.internal   167m         0%     4124Mi          2%        
ip-10-0-206-24.us-east-2.compute.internal    288m         3%     5223Mi          17%       
ip-10-0-207-187.us-east-2.compute.internal   181m         2%     4565Mi          14%       
ip-10-0-219-135.us-east-2.compute.internal   595m         7%     9145Mi          29%       
ip-10-0-220-118.us-east-2.compute.internal   166m         2%     3214Mi          10%    


# Machine info
% oc get machines -A
NAMESPACE               NAME                                          PHASE     TYPE          REGION      ZONE         AGE
openshift-machine-api   qili-awsbig-v9z42-infra-us-east-2a-6j2s7      Running   m5.12xlarge   us-east-2   us-east-2a   3h27m
openshift-machine-api   qili-awsbig-v9z42-infra-us-east-2b-r4wpn      Running   m5.12xlarge   us-east-2   us-east-2b   3h27m
openshift-machine-api   qili-awsbig-v9z42-infra-us-east-2c-sm5fj      Running   m5.12xlarge   us-east-2   us-east-2c   3h27m
openshift-machine-api   qili-awsbig-v9z42-master-0                    Running   m5.4xlarge    us-east-2   us-east-2a   6h6m
openshift-machine-api   qili-awsbig-v9z42-master-1                    Running   m5.4xlarge    us-east-2   us-east-2b   6h6m
openshift-machine-api   qili-awsbig-v9z42-master-2                    Running   m5.4xlarge    us-east-2   us-east-2c   6h6m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-48b64     Running   m5.2xlarge    us-east-2   us-east-2a   6h3m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-794nw     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-dfphs     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-gdx2l     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-m9p9h     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-rm5ds     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-sjwwn     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-swxdh     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-v25f8     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-6nxvc     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-7x92l     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-cl82s     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-g8jtj     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-jpc7z     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-prn8w     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-shnnx     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-tg2z8     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-vdstq     Running   m5.2xlarge    us-east-2   us-east-2b   6h3m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-4nr6v     Running   m5.2xlarge    us-east-2   us-east-2c   6h3m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-54l9w     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-6vt7l     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-7lsmn     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-8xxf7     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-dc5sc     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-dszxk     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-st728     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-vdqm9     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
openshift-machine-api   qili-awsbig-v9z42-workload-us-east-2a-8f4zj   Running   m5.8xlarge    us-east-2   us-east-2a   3h27m

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

AWS-OVN-infra-machines-under-LB.png
427 kB
2022/10/27 9:32 AM

is related to

OCPBUGS-2774 [AWS][GCP] the new created nodes are not added to load balancer

Closed

OCPBUGS-1356 After patched an ingress-controller of a sno aws cluster with livenessProbe timeout, the old router replica is pending termination

Closed

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates