Uploaded image for project: 'Knative Serving'
  1. Knative Serving
  2. SRVKS-1170

[scalability] 3scale-kourier-gateways losing readiness and liveness with ~500 ksvc while creating new ksvcs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • 1.32.0
    • 1.31.0, 1.30.1
    • None
    • None
    • False
    • None
    • True
    • Hide
      **Fixed issues**

      When more than 500 Knative Service (ksvc) instances are created, the readiness and liveness probes of the 3scale-kourier-gateways pod may fail. This is caused by CPU resource exhaustion. In versions 1.32 and later, the default CPU limit for 3scale-kourier-gateways has been increased from 500m to "1," resolving the issue.
      Show
      **Fixed issues** When more than 500 Knative Service (ksvc) instances are created, the readiness and liveness probes of the 3scale-kourier-gateways pod may fail. This is caused by CPU resource exhaustion. In versions 1.32 and later, the default CPU limit for 3scale-kourier-gateways has been increased from 500m to "1," resolving the issue.
    • Bug Fix

      Having a script that creates 10 ksvcs at a time and waits for their readiness (up to max 1000 ksvcs)
      Starting at about 500 ksvcs on the cluster, 3scale-kourier-gateways are starting to lose readiness and then liveness.

      The gateways do settle down once the rate of new ksvcs stops (while having no other significant load towards the ksvcs)

      $ oc get pod -n knative-serving-ingress
      NAME                                      READY   STATUS             RESTARTS        AGE
      3scale-kourier-gateway-57c746d779-4w629   1/1     Running            2 (42m ago)     93m
      3scale-kourier-gateway-57c746d779-8l9cq   1/1     Running            1 (7m23s ago)   26m
      3scale-kourier-gateway-57c746d779-8zngn   0/1     Running            4 (3m27s ago)   25m
      3scale-kourier-gateway-57c746d779-9w7k9   1/1     Running            6 (21s ago)     25m
      3scale-kourier-gateway-57c746d779-fxqxd   0/1     CrashLoopBackOff   17 (50s ago)    76m
      3scale-kourier-gateway-57c746d779-jwrm9   0/1     Running            5 (12s ago)     26m
      3scale-kourier-gateway-57c746d779-mzzlj   0/1     Running            5 (29s ago)     25m
      3scale-kourier-gateway-57c746d779-p2czr   1/1     Running            4 (76s ago)     71m
      3scale-kourier-gateway-57c746d779-tsmrk   0/1     Running            6 (38m ago)     81m
      3scale-kourier-gateway-57c746d779-v8wz4   1/1     Running            2 (48m ago)     76m
      3scale-kourier-gateway-57c746d779-xgpvt   1/1     Running            8 (4m55s ago)   93m
      net-kourier-controller-5f58b57c8c-bbb24   1/1     Running            0               80m
      net-kourier-controller-5f58b57c8c-qnw6n   1/1     Running            0               80m
       

      as en example events from 3scale-kourier-gateway-57c746d779-fxqxd

      Events:
        Type     Reason          Age                    From               Message
        ----     ------          ----                   ----               -------
        Normal   Scheduled       79m                    default-scheduler  Successfully assigned knative-serving-ingress/3scale-kourier-gateway-57c746d779-fxqxd to maschmid-581-vkdj5-worker-0-drstm
        Normal   AddedInterface  79m                    multus             Add eth0 [10.128.2.126/23] from openshift-sdn
        Normal   Created         79m                    kubelet            Created container kourier-gateway
        Normal   Started         79m                    kubelet            Started container kourier-gateway
        Normal   Pulled          54m (x8 over 79m)      kubelet            Container image "registry.redhat.io/openshift-service-mesh/proxyv2-rhel8@sha256:0f1b44e867c5ffc1a7d9ab82907ac9eafb44e12cfa9b76967f85ac0c38e5cb7c" already present on machine
        Warning  BackOff         39m (x67 over 60m)     kubelet            Back-off restarting failed container kourier-gateway in pod 3scale-kourier-gateway-57c746d779-fxqxd_knative-serving-ingress(ab82626a-537f-421d-86f5-c264786b011a)
        Warning  Unhealthy       14m (x347 over 79m)    kubelet            Readiness probe failed: Get "http://10.128.2.126:8081/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        Warning  Unhealthy       4m38s (x318 over 79m)  kubelet            Liveness probe failed: Get "http://10.128.2.126:8081/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
      

        1. 3scale-kourier-gateway-57c746d779-45v82.previous.logs
          197 kB
          Marek Schmidt
        2. create_1000_helloworld_ksvc_with_kourier.sh
          3 kB
          Marek Schmidt
        3. image-2024-01-09-12-25-56-904.png
          102 kB
          Marek Schmidt
        4. must-gather.tar.bz2
          114.75 MB
          Marek Schmidt

              skontopo@redhat.com Stavros Kontopoulos
              maschmid@redhat.com Marek Schmidt
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: