Uploaded image for project: 'Knative Serving'
  1. Knative Serving
  2. SRVKS-1093

net-kourier-controller OOMKilled on soak test with default limits

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • 1.30.0
    • None
    • None

      Running soak tests on a custom 1.30 CI build with fixes for https://issues.redhat.com/browse/SRVKS-1078 applied

      net-kourier-controller were OOMKilled 4 times during the 12h run

      oc get pod -n knative-serving-ingress
      NAME                                      READY   STATUS    RESTARTS      AGE
      3scale-kourier-gateway-6fc8c7d84b-49sm2   1/1     Running   0             37h
      3scale-kourier-gateway-6fc8c7d84b-57222   1/1     Running   0             37h
      3scale-kourier-gateway-6fc8c7d84b-mj95m   1/1     Running   0             37h
      3scale-kourier-gateway-6fc8c7d84b-npt6q   1/1     Running   0             37h
      3scale-kourier-gateway-6fc8c7d84b-r8rd5   1/1     Running   0             37h
      net-kourier-controller-bbfccb44f-8zckc    1/1     Running   4 (9h ago)    19h
      net-kourier-controller-bbfccb44f-trtwj    1/1     Running   4 (12h ago)   19h
      

      Name:             net-kourier-controller-bbfccb44f-8zckc 
      Namespace:        knative-serving-ingress                           
      Priority:         0                                     
      Service Account:  net-kourier                           
      Node:             soak-1-30-11-5mzfc-worker-0-7zx6w/192.168.1.216
      Start Time:       Tue, 20 Jun 2023 09:59:30 -0400                                  
      Labels:           app=net-kourier-controller
                        pod-template-hash=bbfccb44f
      Annotations:      k8s.v1.cni.cncf.io/network-status:
                          [{  
                              "name": "openshift-sdn",
                              "interface": "eth0",
                              "ips": [
                                  "10.128.3.24"
                              ],                                                                         
                              "default": true,
                              "dns": {}            
                          }]            
                        k8s.v1.cni.cncf.io/networks-status:
                          [{                               
                              "name": "openshift-sdn",
                              "interface": "eth0",
                              "ips": [   
                                  "10.128.3.24"                                           
                              ],                                                            
                              "default": true,                                                
                              "dns": {}  
                          }]             
                        openshift.io/scc: restricted-v2
                        prometheus.io/path: /metrics
                        prometheus.io/port: 9090
                        prometheus.io/scrape: true
                        seccomp.security.alpha.kubernetes.io/pod: runtime/default
      Status:           Running
      IP:               10.128.3.24
      IPs:
        IP:           10.128.3.24
      Controlled By:  ReplicaSet/net-kourier-controller-bbfccb44f
      Containers:
        controller:
          Container ID:   cri-o://48321b6c2146adcbcb5ac1e8d48d1d8f4704685a1fd721bd43d37918142048bf
          Image:          quay.io/maschmid/net-kourier-controller:1.30.1
          Image ID:       quay.io/maschmid/net-kourier-controller@sha256:05f565fec35c9a21660dcf594997a31e4112833a29d949599aa6c03d88ccdabe
          Port:           18000/TCP
          Host Port:      0/TCP
          State:          Running
            Started:      Tue, 20 Jun 2023 20:04:42 -0400
          Last State:     Terminated
            Reason:       OOMKilled
            Exit Code:    137
            Started:      Tue, 20 Jun 2023 13:30:15 -0400
            Finished:     Tue, 20 Jun 2023 20:04:41 -0400
          Ready:          True
          Restart Count:  4
          Limits:
            cpu:     500m
            memory:  500Mi
          Requests:
            cpu:      200m
            memory:   200Mi
          Liveness:   exec [/ko-app/kourier -probe-addr=:18000] delay=0s timeout=1s period=10s #success=1 #failure=6
          Readiness:  exec [/ko-app/kourier -probe-addr=:18000] delay=0s timeout=1s period=10s #success=1 #failure=3
          Environment:
            CERTS_SECRET_NAMESPACE:
            CERTS_SECRET_NAME:
            SYSTEM_NAMESPACE:                              knative-serving-ingress (v1:metadata.namespace)
            METRICS_DOMAIN:                                knative.dev/samples
            KOURIER_GATEWAY_NAMESPACE:                     knative-serving-ingress
            ENABLE_SECRET_INFORMER_FILTERING_BY_CERT_UID:  false
            KUBERNETES_MIN_VERSION:                        v1.0.0
            KOURIER_HTTPOPTION_DISABLED:                   true
            SERVING_NAMESPACE:                             knative-serving
            KUBE_API_BURST:                                200
            KUBE_API_QPS:                                  200
          Mounts:
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-msk2g (ro)
      Conditions:
        Type              Status
        Initialized       True
        Ready             True
        ContainersReady   True
        PodScheduled      True
      Volumes:
        kube-api-access-msk2g:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3607
          ConfigMapName:           kube-root-ca.crt
          ConfigMapOptional:       <nil>
          DownwardAPI:             true
          ConfigMapName:           openshift-service-ca.crt
          ConfigMapOptional:       <nil>
      QoS Class:                   Burstable
      Node-Selectors:              <none>
      Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                                   node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Events:                      <none>
      
      

       

       

      KnativeServing with profiling enabled

        spec:
          config:
            logging:
              loglevel.activator: info
              loglevel.autoscaler: info
              loglevel.controller: info
              loglevel.domainmapping: info
              loglevel.domainmapping-webhook: info
              loglevel.hpaautoscaler: info
              loglevel.queueproxy: info
              loglevel.webhook: info
            observability:
              logging.enable-probe-request-log: "true"
              logging.enable-request-log: "true"
              logging.request-log-template: '{"httpRequest": {"requestMethod": "{{.Request.Method}}",
                "requestUrl": "{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}",
                "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent":
                "{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}", "serverIp":
                "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}", "latency":
                "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"}, "traceId":
                "{{index .Request.Header "X-B3-Traceid"}}"}'
              profiling.enable: "true"
          controller-custom-certs:
            name: ""
            type: ""
          deployments:
          - name: 3scale-kourier-gateway
            replicas: 5
          registry: {}
      

              rhn-support-knakayam Kenjiro Nakayama (Inactive)
              maschmid@redhat.com Marek Schmidt
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: