Uploaded image for project: 'OpenShift Service Mesh'
  1. OpenShift Service Mesh
  2. OSSM-6177

Istiod keeps crashing ending up in Crashloopbackoff after upgrade to v2.5.0

XMLWordPrintable

    • False
    • None
    • False
    • Hide
      Previously, when validation messages were enabled in the Service Mesh Control Plane (SMCP), the `istiod` crashed continuously unless `GatewayAPI` support was enabled. Now, when validation messages are enabled but `GatewayAPI` support is not, the `istiod` does not continuously crash."
      Show
      Previously, when validation messages were enabled in the Service Mesh Control Plane (SMCP), the `istiod` crashed continuously unless `GatewayAPI` support was enabled. Now, when validation messages are enabled but `GatewayAPI` support is not, the `istiod` does not continuously crash."
    • Critical

      After upgrade SMCP to the new v2.5.0 version, istiod pods are unstable and keep crashing and end up in CrashLoopBackOff state. After deleting they start but only stay running for a few minutes:

      NAME READY STATUS RESTARTS AGE
      grafana-67464b7bc5-nb9l9 2/2 Running 0 8m58s
      istio-egressgateway-c66d6d644-4qfns 1/1 Running 0 9m
      istio-egressgateway-c66d6d644-dwvc9 1/1 Running 0 9m
      istio-egressgateway-c66d6d644-nrbb7 1/1 Running 0 9m
      istio-ingressgateway-5d8f577465-ksb76 1/1 Running 0 9m
      istio-ingressgateway-5d8f577465-lpd4v 1/1 Running 0 9m
      istio-ingressgateway-5d8f577465-rz2xm 1/1 Running 0 9m
      istiod-smcp-full-deploy-7fc4d58bdb-46bd5 0/1 CrashLoopBackOff 4 (57s ago) 4m47s
      istiod-smcp-full-deploy-7fc4d58bdb-tmcrl 0/1 CrashLoopBackOff 4 (9s ago) 4m26s
      jaeger-69dbfbbf4b-bc5q6 2/2 Running 2 24h
      kiali-76cddddd65-z8xkg 1/1 Running 0 6m55s
      prometheus-586bf7bc66-8f9kg 3/3 Running 0 9m32s

      In the logs it keeps complaining with this:

      oc logs istiod-smcp-full-deploy-7fc4d58bdb-7v62p
      2024-03-27T13:46:11.010736Z info klog Config not found: /var/run/secrets/remote/config
      2024-03-27T13:46:11.013704Z info smmr Member namespace list updated: []
      2024-03-27T13:46:11.039716Z info controllers starting controller=configmap istio-smcp-full-deploy
      2024-03-27T13:46:11.049021Z info model reloading network gateways
      2024-03-27T13:46:11.055264Z info pkica Load signing key and cert from existing secret istio-system/istio-ca-secret
      2024-03-27T13:46:11.057622Z info pkica Using existing public key: ----BEGIN CERTIFICATE----
      MIIC/DCCAeSgAwIBAgIQRScLPXsEc5khnuNFW18xNDANBgkqhkiG9w0BAQsFADAY
      MRYwFAYDVQQKEw1jbHVzdGVyLmxvY2FsMB4XDTI0MDMyNjEzMjYwMloXDTM0MDMy
      NDEzMjYwMlowGDEWMBQGA1UEChMNY2x1c3Rlci5sb2NhbDCCASIwDQYJKoZIhvcN
      AQEBBQADggEPADCCAQoCggEBAMTKJY/acUtLaVN/hBOf5wOh3AOWbeZSrtMACFdL
      ZiHfbKSgjSq4XB93FLkj5DvHg0190F73TF3+yrFOSVmcVEzmKVErtk44DEFgyK7M
      W/xd38LoDHsQoyCq+DXsOP/vKFTtrRtbTkq4bot5Imz9slUl1yYu5JTxk7UL6SRw
      f3U4q1w47OF+9jCGe/1D31U7ye03+2Lz1Jk5DllKH13zJUxnVC5MxLlcS7Isv3Xe
      1ocAk9oiJhh4scB1GAOBqyntQUFQmXjFnb4FwFAHkEjai9lZzuS4kQbSWsQ7xk7n
      d0gjcefu6bS1jX5JMMr8PWGxgjW0+X4qWcjaKTCzklm4vhMCAwEAAaNCMEAwDgYD
      VR0PAQH/BAQDAgIEMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYEFBOrVKlX+Li8
      6gcXAPSJ418BUNNuMA0GCSqGSIb3DQEBCwUAA4IBAQAYAMVI2RHkYXGQosFN3xF1
      mCLakYIm0C9huHMeeUpG/cQCgU9T/vg+BDjYWbz5SPQx/fOyzi+7Dc0jHq3Sw+EN
      zN4y80k/RrMOpnQXB+n4yVHG1S85JZYY9Rhbtrs+Uz1gUHqzNXGYts5jdBtR7z5/
      uBz7B1jS6G+VcSK9wZvr65mCa6jX18a/mdppmOmgQn94GcJDb/jqI18mDWiujDN0
      W/fRnbgOgJUXYldzCSoJzoIQjIdmR9JafBjLprC2sacf9DsNW+8L/q212k6qk6wO
      0SZq21AhyZjMj3IScfrKklqewOi0dqe/TdjK+Nm3AcLEggFcEAYbRt9hsJZDS1Od
      ----END CERTIFICATE----

      2024-03-27T13:46:11.057694Z info rootcertrotator Set up back off time 43m41s to start rotator.
      2024-03-27T13:46:11.057912Z info rootcertrotator Jitter is enabled, wait 43m41s before starting root cert rotator.
      2024-03-27T13:46:11.094205Z warn sidecar injector is not ready
      2024-03-27T13:46:11.446002Z info spiffe Added 1 certs to trust domain cluster.local in peer cert verifier
      2024-03-27T13:46:11.446906Z warn federation ResyncPeriod not specified, defaulting to 1m0s component=federation-discovery-controller
      2024-03-27T13:46:11.447769Z info spiffe Added 1 certs to trust domain cluster.local in peer cert verifier
      2024-03-27T13:46:11.503152Z info kube Initializing Kubernetes service registry "Kubernetes"
      2024-03-27T13:46:11.503274Z info federation Starting controller component=federation-discovery-controller
      2024-03-27T13:46:11.503340Z info klog attempting to acquire leader lease istio-system/servicemesh-federation...
      2024-03-27T13:46:11.503413Z info kube should join leader-election for cluster Kubernetes: false
      2024-03-27T13:46:11.503411Z info federation Starting controller component=federation-imports-controller
      2024-03-27T13:46:11.503437Z info federation Starting controller component=federation-exports-controller
      2024-03-27T13:46:11.504970Z info status Starting status manager
      2024-03-27T13:46:11.505516Z info klog attempting to acquire leader lease istio-system/ior-leader...
      2024-03-27T13:46:11.505702Z info klog attempting to acquire leader lease istio-system/istio-analyze-leader...
      2024-03-27T13:46:11.505995Z info kube Starting Pilot K8S CRD controller controller=crd-controller
      2024-03-27T13:46:11.506065Z info kube controller "networking.istio.io/v1beta1/ProxyConfig" is syncing... controller=crd-controller
      2024-03-27T13:46:11.507123Z info controllers starting controller=healthcheck
      2024-03-27T13:46:11.507338Z info controllers starting controller=unregister_workloadentry
      2024-03-27T13:46:11.507493Z info klog Waiting for caches to sync for smmr
      2024-03-27T13:46:11.508135Z info kube controller "networking.istio.io/v1beta1/ProxyConfig" is syncing... controller=crd-controller
      2024-03-27T13:46:11.511619Z info controllers starting controller=multicluster secret
      2024-03-27T13:46:11.512235Z info kube controller "networking.istio.io/v1beta1/ProxyConfig" is syncing... controller=crd-controller
      2024-03-27T13:46:11.518855Z info klog successfully acquired lease istio-system/istio-analyze-leader
      2024-03-27T13:46:11.519937Z error kube failed to create informer for certificates.k8s.io/v1/CertificateSigningRequest: no informer found for certificates.k8s.io/v1, Resource=certificatesigningrequests controller=analysis-controller
      2024-03-27T13:46:11.520259Z error kube failed to create informer for core/v1/EndpointSlice: no informer found for /v1, Resource=endpointslices controller=analysis-controller
      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x48a4e52]

      goroutine 113 [running]:
      istio.io/istio/pilot/pkg/config/kube/crdclient.handleCRDAdd(0xc00108c540,

      {0xc000d0f9e0, 0x24}

      )
      /remote-source/istio/app/pilot/pkg/config/kube/crdclient/client.go:450 +0x6d2
      istio.io/istio/pilot/pkg/config/kube/crdclient.NewForSchemas(

      {0x5f73300, 0xc000d096c0}, {{0xc000082069, 0x10}, {0x7ffeae569f69, 0xd}, {0x588000a, 0x13}, 0x0, 0x0}, ...)
      /remote-source/istio/app/pilot/pkg/config/kube/crdclient/client.go:189 +0xa33
      istio.io/istio/pkg/config/analysis/incluster.NewController(0x54d4b40?, {0x5f579b8?, 0xc001678b10}, {0x5f73300, 0xc000d096c0}

      ,

      {0xc000082069, 0x10}

      ,

      {0xc00008210e, 0xc}

      , 0xc001380020, ...)
      /remote-source/istio/app/pkg/config/analysis/incluster/controller.go:53 +0x3b1
      istio.io/istio/pilot/pkg/bootstrap.(*Server).initInprocessAnalysisController.func1.1(0xc0007c9fb8?)
      /remote-source/istio/app/pilot/pkg/bootstrap/configcontroller.go:318 +0xe5
      created by istio.io/istio/pilot/pkg/leaderelection.(*LeaderElection).create.func1
      /remote-source/istio/app/pilot/pkg/leaderelection/leaderelection.go:141 +0xb8

      NAME READY STATUS PROFILES VERSION AGE
      smcp-full-deploy 8/9 ComponentsNotReady ["default"] 2.5.0 24h

            mluksa@redhat.com Marko Luksa
            rhn-support-andcosta Andre Costa
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: