Uploaded image for project: 'Knative Serving'
  1. Knative Serving
  2. SRVKS-900

net-istio-controller is running out of memory respectively requiring a huge amount of memory to run stable

XMLWordPrintable

    • False
    • None
    • False

      When integrating OpenShift Serverless with OpenShift Service Mesh (following https://docs.openshift.com/container-platform/4.9/serverless/admin_guide/serverless-ossm-setup.html#serverless-ossm-setup_serverless-ossm-setup) we see net-istio-controller-7b8b4687cd-mzks7 running out of memory during start-up. Even after adding additional memory it continues to getting OOMKilled

      $ oc get pod net-istio-controller-7b8b4687cd-mzks7 -o json
      {
          "apiVersion": "v1",
          "kind": "Pod",
          "metadata": {
              "annotations": {
                  "cluster-autoscaler.kubernetes.io/safe-to-evict": "true",
                  "k8s.v1.cni.cncf.io/network-status": "[{\n    \"name\": \"openshift-sdn\",\n    \"interface\": \"eth0\",\n    \"ips\": [\n        \"10.129.2.243\"\n    ],\n    \"default\": true,\n    \"dns\": {}\n}]",
                  "k8s.v1.cni.cncf.io/networks-status": "[{\n    \"name\": \"openshift-sdn\",\n    \"interface\": \"eth0\",\n    \"ips\": [\n        \"10.129.2.243\"\n    ],\n    \"default\": true,\n    \"dns\": {}\n}]",
                  "openshift.io/scc": "restricted",
                  "sidecar.istio.io/inject": "false"
              },
              "creationTimestamp": "2022-03-23T09:04:06Z",
              "generateName": "net-istio-controller-7b8b4687cd-",
              "labels": {
                  "app": "net-istio-controller",
                  "pod-template-hash": "7b8b4687cd",
                  "serving.knative.dev/release": "devel"
              },
              "name": "net-istio-controller-7b8b4687cd-mzks7",
              "namespace": "knative-serving",
              "ownerReferences": [
                  {
                      "apiVersion": "apps/v1",
                      "blockOwnerDeletion": true,
                      "controller": true,
                      "kind": "ReplicaSet",
                      "name": "net-istio-controller-7b8b4687cd",
                      "uid": "a458bb9e-1165-4625-a86c-aa81bdfba2f1"
                  }
              ],
              "resourceVersion": "43000928",
              "uid": "39eaa109-b891-4f10-9d70-86211a620873"
          },
          "spec": {
              "containers": [
                  {
                      "env": [
                          {
                              "name": "SYSTEM_NAMESPACE",
                              "valueFrom": {
                                  "fieldRef": {
                                      "apiVersion": "v1",
                                      "fieldPath": "metadata.namespace"
                                  }
                              }
                          },
                          {
                              "name": "CONFIG_LOGGING_NAME",
                              "value": "config-logging"
                          },
                          {
                              "name": "CONFIG_OBSERVABILITY_NAME",
                              "value": "config-observability"
                          },
                          {
                              "name": "METRICS_DOMAIN",
                              "value": "knative.dev/net-istio"
                          }
                      ],
                      "image": "registry.redhat.io/openshift-serverless-1/net-istio-controller-rhel8@sha256:ff83481b41f502ae6c7e629ed0aae8692fc7efc180034890650d93784dea0a0a",
                      "imagePullPolicy": "IfNotPresent",
                      "name": "controller",
                      "ports": [
                          {
                              "containerPort": 9090,
                              "name": "metrics",
                              "protocol": "TCP"
                          },
                          {
                              "containerPort": 8008,
                              "name": "profiling",
                              "protocol": "TCP"
                          }
                      ],
                      "resources": {
                          "limits": {
                              "cpu": "1",
                              "memory": "900Mi"
                          },
                          "requests": {
                              "cpu": "300m",
                              "memory": "400Mi"
                          }
                      },
                      "securityContext": {
                          "allowPrivilegeEscalation": false,
                          "capabilities": {
                              "drop": [
                                  "KILL",
                                  "MKNOD",
                                  "SETGID",
                                  "SETUID",
                                  "all"
                              ]
                          },
                          "readOnlyRootFilesystem": true,
                          "runAsNonRoot": true,
                          "runAsUser": 1001010000
                      },
                      "terminationMessagePath": "/dev/termination-log",
                      "terminationMessagePolicy": "File",
                      "volumeMounts": [
                          {
                              "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
                              "name": "kube-api-access-hqlgh",
                              "readOnly": true
                          }
                      ]
                  }
              ],
              "dnsPolicy": "ClusterFirst",
              "enableServiceLinks": true,
              "imagePullSecrets": [
                  {
                      "name": "controller-dockercfg-pknb2"
                  }
              ],
              "nodeName": "ip-10-0-167-88.eu-west-3.compute.internal",
              "preemptionPolicy": "PreemptLowerPriority",
              "priority": 0,
              "restartPolicy": "Always",
              "schedulerName": "default-scheduler",
              "securityContext": {
                  "fsGroup": 1001010000,
                  "seLinuxOptions": {
                      "level": "s0:c32,c9"
                  }
              },
              "serviceAccount": "controller",
              "serviceAccountName": "controller",
              "terminationGracePeriodSeconds": 30,
              "tolerations": [
                  {
                      "effect": "NoExecute",
                      "key": "node.kubernetes.io/not-ready",
                      "operator": "Exists",
                      "tolerationSeconds": 300
                  },
                  {
                      "effect": "NoExecute",
                      "key": "node.kubernetes.io/unreachable",
                      "operator": "Exists",
                      "tolerationSeconds": 300
                  },
                  {
                      "effect": "NoSchedule",
                      "key": "node.kubernetes.io/memory-pressure",
                      "operator": "Exists"
                  }
              ],
              "volumes": [
                  {
                      "name": "kube-api-access-hqlgh",
                      "projected": {
                          "defaultMode": 420,
                          "sources": [
                              {
                                  "serviceAccountToken": {
                                      "expirationSeconds": 3607,
                                      "path": "token"
                                  }
                              },
                              {
                                  "configMap": {
                                      "items": [
                                          {
                                              "key": "ca.crt",
                                              "path": "ca.crt"
                                          }
                                      ],
                                      "name": "kube-root-ca.crt"
                                  }
                              },
                              {
                                  "downwardAPI": {
                                      "items": [
                                          {
                                              "fieldRef": {
                                                  "apiVersion": "v1",
                                                  "fieldPath": "metadata.namespace"
                                              },
                                              "path": "namespace"
                                          }
                                      ]
                                  }
                              },
                              {
                                  "configMap": {
                                      "items": [
                                          {
                                              "key": "service-ca.crt",
                                              "path": "service-ca.crt"
                                          }
                                      ],
                                      "name": "openshift-service-ca.crt"
                                  }
                              }
                          ]
                      }
                  }
              ]
          },
          "status": {
              "conditions": [
                  {
                      "lastProbeTime": null,
                      "lastTransitionTime": "2022-03-23T09:04:06Z",
                      "status": "True",
                      "type": "Initialized"
                  },
                  {
                      "lastProbeTime": null,
                      "lastTransitionTime": "2022-03-23T09:06:19Z",
                      "message": "containers with unready status: [controller]",
                      "reason": "ContainersNotReady",
                      "status": "False",
                      "type": "Ready"
                  },
                  {
                      "lastProbeTime": null,
                      "lastTransitionTime": "2022-03-23T09:06:19Z",
                      "message": "containers with unready status: [controller]",
                      "reason": "ContainersNotReady",
                      "status": "False",
                      "type": "ContainersReady"
                  },
                  {
                      "lastProbeTime": null,
                      "lastTransitionTime": "2022-03-23T09:04:06Z",
                      "status": "True",
                      "type": "PodScheduled"
                  }
              ],
              "containerStatuses": [
                  {
                      "containerID": "cri-o://7e33a5a2cf828461cc40e12ca322c5d780273e10ab6353bcd6e57bebbf181799",
                      "image": "registry.redhat.io/openshift-serverless-1/net-istio-controller-rhel8@sha256:ff83481b41f502ae6c7e629ed0aae8692fc7efc180034890650d93784dea0a0a",
                      "imageID": "registry.redhat.io/openshift-serverless-1/net-istio-controller-rhel8@sha256:bbfa04c9f6b234dc993b4116da5438005bd69ff32a944276bbafcd4f9175de8a",
                      "lastState": {
                          "terminated": {
                              "containerID": "cri-o://7e33a5a2cf828461cc40e12ca322c5d780273e10ab6353bcd6e57bebbf181799",
                              "exitCode": 137,
                              "finishedAt": "2022-03-23T09:06:18Z",
                              "reason": "OOMKilled",
                              "startedAt": "2022-03-23T09:06:12Z"
                          }
                      },
                      "name": "controller",
                      "ready": false,
                      "restartCount": 4,
                      "started": false,
                      "state": {
                          "waiting": {
                              "message": "back-off 1m20s restarting failed container=controller pod=net-istio-controller-7b8b4687cd-mzks7_knative-serving(39eaa109-b891-4f10-9d70-86211a620873)",
                              "reason": "CrashLoopBackOff"
                          }
                      }
                  }
              ],
              "hostIP": "10.0.167.88",
              "phase": "Running",
              "podIP": "10.129.2.243",
              "podIPs": [
                  {
                      "ip": "10.129.2.243"
                  }
              ],
              "qosClass": "Burstable",
              "startTime": "2022-03-23T09:04:06Z"
          }
      }
      

      The logs of the respective pod look as following:

      $ oc logs net-istio-controller-7b8b4687cd-mzks7
      2022/03/23 09:10:42 Registering 4 clients
      2022/03/23 09:10:42 Registering 4 informer factories
      2022/03/23 09:10:42 Registering 9 informers
      2022/03/23 09:10:42 Registering 2 controllers
      {"severity":"INFO","timestamp":"2022-03-23T09:10:42.818925424Z","caller":"logging/config.go:116","message":"Successfully created the logger."}
      {"severity":"INFO","timestamp":"2022-03-23T09:10:42.818961351Z","caller":"logging/config.go:117","message":"Logging level set to: info"}
      {"severity":"INFO","timestamp":"2022-03-23T09:10:42.818974148Z","caller":"logging/config.go:79","message":"Fetch GitHub commit ID from kodata failed","error":"\"KO_DATA_PATH\" does not exist or is empty"}
      {"severity":"INFO","timestamp":"2022-03-23T09:10:42.818989333Z","logger":"net-istio-controller","caller":"profiling/server.go:64","message":"Profiling enabled: false"}
      {"severity":"INFO","timestamp":"2022-03-23T09:10:42.823965295Z","logger":"net-istio-controller","caller":"leaderelection/context.go:46","message":"Running with Standard leader election"}
      {"severity":"INFO","timestamp":"2022-03-23T09:10:42.832762343Z","logger":"net-istio-controller","caller":"sharedmain/main.go:202","message":"Starting configuration manager..."}
      {"severity":"INFO","timestamp":"2022-03-23T09:10:42.875967373Z","logger":"net-istio-controller","caller":"metrics/metrics_worker.go:76","message":"Flushing the existing exporter before setting up the new exporter."}
      {"severity":"INFO","timestamp":"2022-03-23T09:10:42.876030004Z","logger":"net-istio-controller","caller":"metrics/metrics_worker.go:91","message":"Successfully updated the metrics exporter; old config: <nil>; new config &{knative.dev/net-istio net_istio_controller none 0 <nil> <nil>  false 0 }"}
      {"severity":"INFO","timestamp":"2022-03-23T09:10:42.87605062Z","logger":"net-istio-controller","caller":"profiling/server.go:102","message":"Profiling enabled: true"}
      {"level":"info","ts":1648026642.932907,"logger":"fallback","caller":"injection/injection.go:61","msg":"Starting informers..."}
      

      Only with about a memory limit of 2 Gi it starts to remain stable and won't crash anymore

              skontopo@redhat.com Stavros Kontopoulos
              rhn-support-sreber Simon Reber
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: