Uploaded image for project: 'OpenShift Service Mesh'
  1. OpenShift Service Mesh
  2. OSSM-4542

Istio side-car injection failure on intermittent pods

XMLWordPrintable

    • Icon: Ticket Ticket
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • OSSM 2.4.1
    • Maistra
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Issue: istio injection fails intermittently and when they restart the pod it comes up with a sidecar.

      ENV: ROSA

      customer statement:
      ~~~
      We are configuring knative serverless deployments to use istio. We have been using this for the past 12 months. We have noticed today that some times when pods are starting they are not having the istio side car injected but if we delete them and allow them to restart then the side car is injected.

      We have everything configured correctly and would expect the injection to happen everytime.

      We have just moved to ocp 4.12.15 and we were on 4.10 previously.
      ~~~

      NAME DISPLAY VERSION REPLACES PHASE
      servicemeshoperator.v2.4.1 Red Hat OpenShift Service Mesh 2.4.1-0 servicemeshoperator.v2.4.0 Succeeded

      $ omc get smcp -o wide
      NAME READY STATUS PROFILES VERSION AGE IMAGE REGISTRY
      service-mesh-control-plane 10/10 ComponentsReady ["default"] 2.1.6 394d registry.redhat.io/openshift-service-mesh

      ----------------------

      Observations:

      The correct annotations are present on the target example problem pod for sidecar injection:

      ~~~
      $ omc get deployment default-integration-runtime-ir-00630-deployment -o json -n acfcpldmhtm| jq .spec.template.metadata.annotations

      { "autoscaling.knative.dev/min-scale": "1", "productChargedContainers": "", "productID": "<obfuscated>", "productMetric": "FREE", "productName": "IBM App Connect Enterprise", "serving.knative.dev/creator": "system:serviceaccount:openshift-operators:ibm-appconnect-operator", "sidecar.istio.io/inject": "true", "sidecar.istio.io/rewriteAppHTTPProbers": "true" }

      ~~~

      SMMR lists the target namespace (and many others that experience this issue, though sample data is focused on the one below as part of the mesh + reconciled):

      ~~~
      $ omc get smmr default -o json | jq .spec.members | grep acfcpldmhtm
      "acfcpldmhtm",

      • conditions:
      • lastTransitionTime: "2022-07-19T19:08:21Z"
        status: "True"
        type: Reconciled
        namespace: acfcpldmhtm
        ~~~

      I have reviewed https://docs.google.com/document/d/1lhTxBzmJroEjlf0eGwEVEOAXUI0V722EmTh3nzVCgrs prior to submission.

      case number: https://access.redhat.com/support/cases/#/case/03567298

      I will include additional data in first comment block after submission with specific highlights/pulls + log bundles.

      Logging as high impact/priority due to high SEV on case + requested urgency by customer due to upcoming project that relies on this service working without interference.

      There is presently a workaround: restart the affected pod.

              aslak@redhat.com Aslak Knutsen
              rhn-support-wrussell Will Russell
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: