Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2557

router-perf routes can't be accessed after scaling up cluster on AWS and GCP

XMLWordPrintable

    • None
    • Sprint 226
    • 1
    • Approved
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      On an AWS SDN cluster with 27 worker nodes, and router pods on separate INFRA nodes, run router-perf test to create 2k pods/routes/services and rollout routers, after that all routes can't be reached.

      Version-Release number of selected component (if applicable):

      4.12.0-0.nightly-2022-10-18-192348
      HA-Proxy version 2.2.24-26b8015 2022/05/13 - https://haproxy.org/

      How reproducible:

      First time see this in 4.12 test. I will update when I do more tests.
      Similar issue reported in 4.10 https://bugzilla.redhat.com/show_bug.cgi?id=2035481
      Not reproduce on a AWS cluster(SDN network) with 3 masters and 3 workers with m6i.xlarge type and 400 pods/services/routes.
      Successful job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/router-perf/710/console

      Steps to Reproduce:

      1. Install a AWS cluster(SDN network) with vm_type_masters: m5.4xlarge
      vm_type_workers: m5.2xlarge
      Install job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/147668/
      2. Scaleup the worker machinesets to 27 nodes, install INFRA machinesets with 3 nodes, and WORKLOAD machinests with 1 node. The INFRA nodes will get the router pods running on it. The WORKLOAD node will run the test workload as client.
      Scale up and install INFRA and WORKLOAD machinesets job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/cluster-workers-scaling/1570/ 
      3. Run router-perf to load the cluster with 2k pods/services/routes. 
      Test job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/router-perf/713/console 
      4. Test the routes after the test resources were created successfully

      Actual results:

      After router-perf test created the 2k pods/services/routes, the routes can't be accessed. All routes including console can't be accessed.

      Expected results:

      All routes should be accessible

      Additional info:

      Must-gather: http://file.nay.redhat.com/~qili/OCPBUGS-2557/must-gather.local.233924781796810649.tar.gz
      
      Router-perf test: https://github.com/cloud-bulldozer/e2e-benchmarking/blob/master/workloads/router-perf-v2/ingress-performance.sh
         
      #Check after the test finished creating 2k pods/services/routes and rolled out the ingress pods
      
      
      #All resources are successfully created
      
      
      % for termination in http edge passthrough reencrypt; do echo pods in http-scale-${termination}; oc get pods -n http-scale-${termination}| grep Running| wc -l; echo services in http-scale-${termination}; oc get services --no-headers -n http-scale-${termination} | wc -l; echo endpoints in http-scale-${termination}; oc get endpoints --no-headers -n http-scale-${termination} | wc -l; done
      pods in http-scale-http
           500
      services in http-scale-http
           500
      endpoints in http-scale-http
           500
      pods in http-scale-edge
           500
      services in http-scale-edge
           500
      endpoints in http-scale-edge
           500
      pods in http-scale-passthrough
           500
      services in http-scale-passthrough
           500
      endpoints in http-scale-passthrough
           500
      pods in http-scale-reencrypt
           500
      services in http-scale-reencrypt
           500
      endpoints in http-scale-reencrypt
           500
      
      
      # Test routes are not working
      
      
      % oc get routes -n http-scale-http | head -n 2 
      NAME            HOST/PORT                                                                    PATH   SERVICES        PORT   TERMINATION   WILDCARD
      http-perf-1     http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com            http-perf-1     http                 None
      
      
      % curl -I http://http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com -v
      *   Trying 3.136.63.97:80...
      * Connected to http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com (3.136.63.97) port 80 (#0)
      > HEAD / HTTP/1.1
      > Host: http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com
      > User-Agent: curl/7.79.1
      > Accept: */*
      > 
      * Empty reply from server
      * Closing connection 0
      curl: (52) Empty reply from server
      
      
      # All other routes are not working
      
      
      % oc get co --no-headers| grep -v 'True.*False.*False'
      authentication                             4.12.0-0.nightly-2022-10-18-192348   False   False   True    122m    OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.qili-awsbig.qe.devcluster.openshift.com/healthz": EOF
      console                                    4.12.0-0.nightly-2022-10-18-192348   False   False   False   122m    RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.qili-awsbig.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.qili-awsbig.qe.devcluster.openshift.com": EOF
      ==================================
      # Ingress pods info
      
      
      % oc get po -n openshift-ingress -o wide
      NAME                             READY   STATUS    RESTARTS   AGE    IP             NODE                                         NOMINATED NODE   READINESS GATES
      router-default-f6646c495-d2tc7   1/1     Running   0          153m   10.131.14.9    ip-10-0-204-218.us-east-2.compute.internal   <none>           <none>
      router-default-f6646c495-ks64z   1/1     Running   0          153m   10.129.16.10   ip-10-0-174-104.us-east-2.compute.internal   <none>           <none>
      
      
      # curl the route from inside the infress pod failed
      
      
      % oc exec -it -n openshift-ingress router-default-f6646c495-ks64z -- curl --resolve http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com:80:localhost http://http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com -I
      curl: (52) Empty reply from server
      command terminated with exit code 52
      
      
      # Found the route info in /var/lib/haproxy/conf/haproxy.config
      oc exec -it -n openshift-ingress router-default-f6646c495-ks64z -- bash
      cat /var/lib/haproxy/conf/haproxy.config | grep ...
      
      
      # Plain http backend or backend with TLS terminated at the edge or a
      # secure backend with re-encryption.
      backend be_http:http-scale-http:http-perf-1
        mode http
        option redispatch
        option forwardfor
        balance random
      
      
        timeout check 5000ms
        http-request add-header X-Forwarded-Host %[req.hdr(host)]
        http-request add-header X-Forwarded-Port %[dst_port]
        http-request add-header X-Forwarded-Proto http if !{ ssl_fc }
        http-request add-header X-Forwarded-Proto https if { ssl_fc }
        http-request add-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
        http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)]
        cookie 8707abd198467b89ecf3f7e5402fb688 insert indirect nocache httponly
        server pod:http-perf-1-78575cc4bb-j8qhn:http-perf-1:http:10.128.12.7:8080 10.128.12.7:8080 cookie fff446f90347f39d23317bbcf76dc8a1 weight 1
      
      
      # Describe the service
      % oc describe service http-perf-1  -n http-scale-http 
      Name:                     http-perf-1
      Namespace:                http-scale-http
      Labels:                   app=http-perf
                                kube-burner-index=1
                                kube-burner-job=http-scale-http
                                kube-burner-uuid=f89fbbae-0c23-4a78-a67b-b056423e6456
      Annotations:              <none>
      Selector:                 app=nginx-1
      Type:                     NodePort
      IP Family Policy:         SingleStack
      IP Families:              IPv4
      IP:                       172.30.35.211
      IPs:                      172.30.35.211
      Port:                     http  8080/TCP
      TargetPort:               8080/TCP
      NodePort:                 http  32577/TCP
      Endpoints:                10.128.12.7:8080
      Session Affinity:         None
      External Traffic Policy:  Cluster
      Events:                   <none>
      
      
      
      
      ======================
      # Debug inside the router pod
      
      
      # pod 80 and 443 can not response
      
      
      sh-4.4$ curl 10.129.16.10
      curl: (56) Recv failure: Connection reset by peer
      bash-4.4$ curl http://10.131.14.11:80
      curl: (56) Recv failure: Connection reset by peer
      sh-4.4$ curl https://10.129.16.10:443
      curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 10.129.16.10:443 
      
      
      # Yet port 1936 responds:
      
      
      sh-4.4$ curl 10.129.16.10:1936
      Forbidden: 
      
      
      # We appear to be listening on port 443 and 80:
      
      
      sh-4.4$ ss -l -n 
      Netid        State         Recv-Q        Send-Q                                                 Local Address:Port                 Peer Address:Port       Process       
      nl           UNCONN        0             0                                                                  0:0                                *                         
      nl           UNCONN        0             0                                                                  4:0                                *                         
      nl           UNCONN        0             0                                                                  6:0                                *                         
      nl           UNCONN        0             0                                                                  9:0                                *                         
      nl           UNCONN        0             0                                                                 10:0                                *                         
      nl           UNCONN        0             0                                                                 12:0                                *                         
      nl           UNCONN        0             0                                                                 15:0                                *                         
      nl           UNCONN        0             0                                                                 16:0                                *                         
      u_str        LISTEN        0             0                           /var/lib/haproxy/run/haproxy.sock.56.tmp 200473                          * 0                        
      u_str        LISTEN        0             0                       /var/lib/haproxy/run/haproxy-sni.sock.56.tmp 200478                          * 0                        
      u_str        LISTEN        0             0                    /var/lib/haproxy/run/haproxy-no-sni.sock.56.tmp 200479                          * 0                        
      u_dgr        UNCONN        0             0                                                             @00014 154486                          * 0                        
      u_dgr        UNCONN        0             0                                                                  * 741881                          * 0                        
      u_dgr        UNCONN        0             0                                                                  * 741882                          * 0                        
      tcp          LISTEN        0             0                                                            0.0.0.0:10081                     0.0.0.0:*                        
      tcp          LISTEN        0             0                                                            0.0.0.0:80                        0.0.0.0:*                        
      tcp          LISTEN        0             0                                                            0.0.0.0:443                       0.0.0.0:*                        
      tcp          LISTEN        0             0                                                                  *:1936                            *:*    
      
      
      # If I run the following to stand up another HTTP server:
      
      
      % oc rsh -n openshift-ingress router-default-f6646c495-ks64z socat -T 1 -d -d tcp-l:10081,reuseaddr,fork,crlf system:"echo -e \"\\\"HTTP/1.0 200 OK\\\nDocumentType: text/html\\\n\\\n<html>date: \$\(date\)<br>server:\$SOCAT_SOCKADDR:\$SOCAT_SOCKPORT<br>client: \$SOCAT_PEERADDR:\$SOCAT_PEERPORT\\\n<pre>\\\"\"; cat; echo -e \"\\\"\\\n</pre></html>\\\"\""
      2022/10/19 07:01:49 socat[129] W ioctl(5, IOCTL_VM_SOCKETS_GET_LOCAL_CID, ...): Inappropriate ioctl for device
      2022/10/19 07:01:49 socat[129] N listening on AF=2 0.0.0.0:10081
      
      
      # And curl that new endpoint:
      
      
      % oc rsh -n openshift-ingress router-default-f6646c495-ks64z
      sh-4.4$ curl http://10.129.16.10:10081
      HTTP/1.0 200 OK
      DocumentType: text/html
      
      
      # Let's bind to port 80 instead of 10081
      
      
      % oc rsh -n openshift-ingress router-default-f6646c495-ks64z socat -T 1 -d -d tcp-l:80,reuseaddr,fork,crlf system:"echo -e \"\\\"HTTP/1.0 200 OK\\\nDocumentType: text/html\\\n\\\n<html>date: \$\(date\)<br>server:\$SOCAT_SOCKADDR:\$SOCAT_SOCKPORT<br>client: \$SOCAT_PEERADDR:\$SOCAT_PEERPORT\\\n<pre>\\\"\"; cat; echo -e \"\\\"\\\n</pre></html>\\\"\""
      2022/10/19 07:11:42 socat[149] W ioctl(5, IOCTL_VM_SOCKETS_GET_LOCAL_CID, ...): Inappropriate ioctl for device
      2022/10/19 07:11:42 socat[149] E bind(5, {AF=2 0.0.0.0:80}, 16): Permission denied
      2022/10/19 07:11:42 socat[149] N exit(1)
      command terminated with exit code 1
      
      
      # Was really expecting "port already in use".
      
      
      # Trying again on port 81
      
      
      % oc rsh -n openshift-ingress router-default-f6646c495-ks64z socat -T 1 -d -d tcp-l:81,reuseaddr,fork,crlf system:"echo -e \"\\\"HTTP/1.0 200 OK\\\nDocumentType: text/html\\\n\\\n<html>date: \$\(date\)<br>server:\$SOCAT_SOCKADDR:\$SOCAT_SOCKPORT<br>client: \$SOCAT_PEERADDR:\$SOCAT_PEERPORT\\\n<pre>\\\"\"; cat; echo -e \"\\\"\\\n</pre></html>\\\"\""
      2022/10/19 07:12:33 socat[155] W ioctl(5, IOCTL_VM_SOCKETS_GET_LOCAL_CID, ...): Inappropriate ioctl for device
      2022/10/19 07:12:33 socat[155] E bind(5, {AF=2 0.0.0.0:81}, 16): Permission denied
      2022/10/19 07:12:33 socat[155] N exit(1)
      command terminated with exit code 1
      
      
      # Cannot bind to low port numbers
      
      
      # Logs from pods
      
      
      % oc get pods -n openshift-ingress
      NAME                             READY   STATUS    RESTARTS   AGE
      router-default-f6646c495-d2tc7   1/1     Running   0          162m
      router-default-f6646c495-ks64z   1/1     Running   0          162m
      
      
      % oc logs -f -n openshift-ingress router-default-f6646c495-d2tc7 
      I1019 04:31:27.847704       1 template.go:437] router "msg"="starting router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: 0493edbf2c0eec739be014ce6032f6b1941b8487\nversionFromGit: 4.0.0-402-g0493edbf\ngitTreeState: clean\nbuildDate: 2022-10-14T21:32:53Z\n"
      I1019 04:31:27.848805       1 metrics.go:156] metrics "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936"
      I1019 04:31:27.852172       1 router.go:191] template "msg"="creating a new template router" "writeDir"="/var/lib/haproxy"
      I1019 04:31:27.852214       1 router.go:273] template "msg"="router will coalesce reloads within an interval of each other" "interval"="5s"
      I1019 04:31:27.852427       1 router.go:343] template "msg"="watching for changes" "path"="/etc/pki/tls/private"
      I1019 04:31:27.852476       1 router.go:269] router "msg"="router is including routes in all namespaces" 
      E1019 04:31:28.620828       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
      I1019 04:31:28.735869       1 healthz.go:257] backend-proxy-http check failed: healthz
      [-]backend-proxy-http failed: dial tcp [::1]:80: connect: connection refused
      I1019 04:31:29.081209       1 router.go:618] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 using PROXY protocol ...\n - Health check ok : 0 retry attempt(s).\n"
      I1019 04:31:34.220802       1 router.go:618] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 using PROXY protocol ...\n - Health check ok : 0 retry attempt(s).\n"
      ^C
      
      
      % oc logs -f -n openshift-ingress router-default-f6646c495-ks64z 
      I1019 04:30:18.686510       1 template.go:437] router "msg"="starting router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: 0493edbf2c0eec739be014ce6032f6b1941b8487\nversionFromGit: 4.0.0-402-g0493edbf\ngitTreeState: clean\nbuildDate: 2022-10-14T21:32:53Z\n"
      I1019 04:30:18.688721       1 metrics.go:156] metrics "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936"
      I1019 04:30:18.692149       1 router.go:191] template "msg"="creating a new template router" "writeDir"="/var/lib/haproxy"
      I1019 04:30:18.692193       1 router.go:273] template "msg"="router will coalesce reloads within an interval of each other" "interval"="5s"
      I1019 04:30:18.692415       1 router.go:343] template "msg"="watching for changes" "path"="/etc/pki/tls/private"
      I1019 04:30:18.692455       1 router.go:269] router "msg"="router is including routes in all namespaces" 
      E1019 04:30:19.469011       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
      I1019 04:30:19.469163       1 healthz.go:257] backend-proxy-http check failed: healthz
      [-]backend-proxy-http failed: dial tcp [::1]:80: connect: connection refused
      I1019 04:30:19.948598       1 router.go:618] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 using PROXY protocol ...\n - Health check ok : 0 retry attempt(s).\n"
      I1019 04:30:24.958710       1 router.go:618] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 using PROXY protocol ...\n - Health check ok : 0 retry attempt(s).\n"
      2022/10/19 07:05:38 http: TLS handshake error from 10.129.16.10:53964: local error: tls: bad record MAC
      
      
      
      
      =================
      # The problem kept for 3 hours+
      
      
      % oc get co
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.12.0-0.nightly-2022-10-18-192348   False       False         True       3h2m    OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.qili-awsbig.qe.devcluster.openshift.com/healthz": EOF
      baremetal                                  4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h44m   
      cloud-controller-manager                   4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h46m   
      cloud-credential                           4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h47m   
      cluster-autoscaler                         4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h44m   
      config-operator                            4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h45m   
      console                                    4.12.0-0.nightly-2022-10-18-192348   False       False         False      3h2m    RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.qili-awsbig.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.qili-awsbig.qe.devcluster.openshift.com": EOF
      control-plane-machine-set                  4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h44m   
      csi-snapshot-controller                    4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h45m   
      dns                                        4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h45m   
      etcd                                       4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h43m   
      image-registry                             4.12.0-0.nightly-2022-10-18-192348   True        False         False      5h36m   
      ingress                                    4.12.0-0.nightly-2022-10-18-192348   True        False         True       5h36m   The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
      ....
      
      
      # Delete the ingress pods
      
      
      % oc get po -n openshift-ingress
      NAME                             READY   STATUS    RESTARTS   AGE
      router-default-f6646c495-d2tc7   1/1     Running   0          176m
      router-default-f6646c495-ks64z   1/1     Running   0          176m
      qili@qili-mac Oct19 % oc delete po -n openshift-ingress router-default-f6646c495-d2tc7 
      pod "router-default-f6646c495-d2tc7" deleted
      
      
       % oc get po -n openshift-ingress
      NAME                             READY   STATUS    RESTARTS   AGE
      router-default-f6646c495-5l2kp   1/1     Running   0          112s
      router-default-f6646c495-llvxx   1/1     Running   0          22s
      
      
      # That's not helpful
      
      
      % curl -I http://http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com -v
      *   Trying 18.223.67.40:80...
      * Connected to http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com (18.223.67.40) port 80 (#0)
      > HEAD / HTTP/1.1
      > Host: http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com
      > User-Agent: curl/7.79.1
      > Accept: */*
      > 
      * Empty reply from server
      * Closing connection 0
      curl: (52) Empty reply from server
      
      
      
      
      # Delete all test resources
      % for termination in http edge passthrough reencrypt; do oc delete ns http-scale-${termination}; done
      namespace "http-scale-http" deleted
      namespace "http-scale-edge" deleted
      namespace "http-scale-passthrough" deleted
      namespace "http-scale-reencrypt" deleted
      
      
      # That's not helpful too 
      
      
      % curl -I http://http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com -v
      *   Trying 3.136.63.97:80...
      * Connected to http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com (3.136.63.97) port 80 (#0)
      > HEAD / HTTP/1.1
      > Host: http-perf-1-http-scale-http.apps.qili-awsbig.qe.devcluster.openshift.com
      > User-Agent: curl/7.79.1
      > Accept: */*
      > 
      * Empty reply from server
      * Closing connection 0
      curl: (52) Empty reply from server
      
      
      ==================
      # Other general info
      # Nodes are good
      % oc get nodes
      NAME                                         STATUS   ROLES                  AGE     VERSION
      ip-10-0-129-166.us-east-2.compute.internal   Ready    workload               3h11m   v1.25.2+5bf2e1f
      ip-10-0-133-87.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-136-149.us-east-2.compute.internal   Ready    infra                  3h11m   v1.25.2+5bf2e1f
      ip-10-0-140-227.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-147-123.us-east-2.compute.internal   Ready    worker                 4h59m   v1.25.2+5bf2e1f
      ip-10-0-147-236.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-150-216.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-153-12.us-east-2.compute.internal    Ready    control-plane,master   5h51m   v1.25.2+5bf2e1f
      ip-10-0-155-172.us-east-2.compute.internal   Ready    worker                 4h59m   v1.25.2+5bf2e1f
      ip-10-0-156-117.us-east-2.compute.internal   Ready    worker                 4h59m   v1.25.2+5bf2e1f
      ip-10-0-156-64.us-east-2.compute.internal    Ready    worker                 5h46m   v1.25.2+5bf2e1f
      ip-10-0-159-84.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-162-10.us-east-2.compute.internal    Ready    control-plane,master   5h52m   v1.25.2+5bf2e1f
      ip-10-0-164-52.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-169-86.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-174-104.us-east-2.compute.internal   Ready    infra                  3h10m   v1.25.2+5bf2e1f
      ip-10-0-175-143.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-176-254.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-181-242.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-181-79.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-183-145.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-184-49.us-east-2.compute.internal    Ready    worker                 5h41m   v1.25.2+5bf2e1f
      ip-10-0-190-1.us-east-2.compute.internal     Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-198-193.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-198-255.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-198-41.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-199-41.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-199-69.us-east-2.compute.internal    Ready    worker                 5h47m   v1.25.2+5bf2e1f
      ip-10-0-204-114.us-east-2.compute.internal   Ready    control-plane,master   5h52m   v1.25.2+5bf2e1f
      ip-10-0-204-218.us-east-2.compute.internal   Ready    infra                  3h11m   v1.25.2+5bf2e1f
      ip-10-0-206-24.us-east-2.compute.internal    Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-207-187.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-219-135.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
      ip-10-0-220-118.us-east-2.compute.internal   Ready    worker                 4h58m   v1.25.2+5bf2e1f
      
      
      # Pods are good
      % oc get pods --no-headers -A| egrep -v 'Running|Completed'
      
      
      # Node resource usage
      % oc adm top nodes
      NAME                                         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
      ip-10-0-129-166.us-east-2.compute.internal   94m          0%     3136Mi          2%        
      ip-10-0-133-87.us-east-2.compute.internal    112m         1%     3307Mi          10%       
      ip-10-0-136-149.us-east-2.compute.internal   183m         0%     4127Mi          2%        
      ip-10-0-140-227.us-east-2.compute.internal   155m         2%     3216Mi          10%       
      ip-10-0-147-123.us-east-2.compute.internal   115m         1%     2929Mi          9%        
      ip-10-0-147-236.us-east-2.compute.internal   348m         4%     9452Mi          30%       
      ip-10-0-150-216.us-east-2.compute.internal   207m         2%     4210Mi          13%       
      ip-10-0-153-12.us-east-2.compute.internal    612m         3%     10205Mi         16%       
      ip-10-0-155-172.us-east-2.compute.internal   161m         2%     3095Mi          10%       
      ip-10-0-156-117.us-east-2.compute.internal   162m         2%     3737Mi          12%       
      ip-10-0-156-64.us-east-2.compute.internal    191m         2%     5533Mi          18%       
      ip-10-0-159-84.us-east-2.compute.internal    138m         1%     4180Mi          13%       
      ip-10-0-162-10.us-east-2.compute.internal    495m         3%     9895Mi          15%       
      ip-10-0-164-52.us-east-2.compute.internal    196m         2%     3801Mi          12%       
      ip-10-0-169-86.us-east-2.compute.internal    170m         2%     4221Mi          13%       
      ip-10-0-174-104.us-east-2.compute.internal   199m         0%     3908Mi          2%        
      ip-10-0-175-143.us-east-2.compute.internal   141m         1%     2849Mi          9%        
      ip-10-0-176-254.us-east-2.compute.internal   252m         3%     4730Mi          15%       
      ip-10-0-181-242.us-east-2.compute.internal   142m         1%     3085Mi          10%       
      ip-10-0-181-79.us-east-2.compute.internal    174m         2%     4136Mi          13%       
      ip-10-0-183-145.us-east-2.compute.internal   116m         1%     3731Mi          12%       
      ip-10-0-184-49.us-east-2.compute.internal    236m         3%     5455Mi          17%       
      ip-10-0-190-1.us-east-2.compute.internal     193m         2%     3609Mi          11%       
      ip-10-0-198-193.us-east-2.compute.internal   146m         1%     3262Mi          10%       
      ip-10-0-198-255.us-east-2.compute.internal   180m         2%     4799Mi          15%       
      ip-10-0-198-41.us-east-2.compute.internal    163m         2%     5300Mi          17%       
      ip-10-0-199-41.us-east-2.compute.internal    224m         2%     4702Mi          15%       
      ip-10-0-199-69.us-east-2.compute.internal    269m         3%     5568Mi          18%       
      ip-10-0-204-114.us-east-2.compute.internal   853m         5%     12709Mi         20%       
      ip-10-0-204-218.us-east-2.compute.internal   167m         0%     4124Mi          2%        
      ip-10-0-206-24.us-east-2.compute.internal    288m         3%     5223Mi          17%       
      ip-10-0-207-187.us-east-2.compute.internal   181m         2%     4565Mi          14%       
      ip-10-0-219-135.us-east-2.compute.internal   595m         7%     9145Mi          29%       
      ip-10-0-220-118.us-east-2.compute.internal   166m         2%     3214Mi          10%    
      
      
      # Machine info
      % oc get machines -A
      NAMESPACE               NAME                                          PHASE     TYPE          REGION      ZONE         AGE
      openshift-machine-api   qili-awsbig-v9z42-infra-us-east-2a-6j2s7      Running   m5.12xlarge   us-east-2   us-east-2a   3h27m
      openshift-machine-api   qili-awsbig-v9z42-infra-us-east-2b-r4wpn      Running   m5.12xlarge   us-east-2   us-east-2b   3h27m
      openshift-machine-api   qili-awsbig-v9z42-infra-us-east-2c-sm5fj      Running   m5.12xlarge   us-east-2   us-east-2c   3h27m
      openshift-machine-api   qili-awsbig-v9z42-master-0                    Running   m5.4xlarge    us-east-2   us-east-2a   6h6m
      openshift-machine-api   qili-awsbig-v9z42-master-1                    Running   m5.4xlarge    us-east-2   us-east-2b   6h6m
      openshift-machine-api   qili-awsbig-v9z42-master-2                    Running   m5.4xlarge    us-east-2   us-east-2c   6h6m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-48b64     Running   m5.2xlarge    us-east-2   us-east-2a   6h3m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-794nw     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-dfphs     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-gdx2l     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-m9p9h     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-rm5ds     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-sjwwn     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-swxdh     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2a-v25f8     Running   m5.2xlarge    us-east-2   us-east-2a   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-6nxvc     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-7x92l     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-cl82s     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-g8jtj     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-jpc7z     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-prn8w     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-shnnx     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-tg2z8     Running   m5.2xlarge    us-east-2   us-east-2b   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2b-vdstq     Running   m5.2xlarge    us-east-2   us-east-2b   6h3m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-4nr6v     Running   m5.2xlarge    us-east-2   us-east-2c   6h3m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-54l9w     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-6vt7l     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-7lsmn     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-8xxf7     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-dc5sc     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-dszxk     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-st728     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
      openshift-machine-api   qili-awsbig-v9z42-worker-us-east-2c-vdqm9     Running   m5.2xlarge    us-east-2   us-east-2c   5h15m
      openshift-machine-api   qili-awsbig-v9z42-workload-us-east-2a-8f4zj   Running   m5.8xlarge    us-east-2   us-east-2a   3h27m
      

              amcdermo@redhat.com Andrew McDermott
              rhn-support-qili Qiujie Li
              Hongan Li Hongan Li
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: