Loading...

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: OSSM 3.0.0
Component/s: Sail Operator
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
Istio Ambient Mode - Tech Preview
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

This bug is similar to the following bug, but in another OCP version:
https://issues.redhat.com/browse/OSSM-9053

The Openshift custer configured with ambient mode.
The ambient mode configuration exists on the namespace level.
"istio.io/dataplane-mode: ambient"
The pods stuck in the CrashLoopBackOff state, because unable to pass the health check.

Note:
In OCP 4.16 this issue happens when the ambient mode installed second time.
When executing, for example e2e testing for the first time after cluster fresh deployment, everything is working and ambient mode pods are able to start.
But when I'm executing the same exct e2e flow second, third, etc.. time, I'm facing this issue.

Overall pods status.
The running pods are running with "istio.io/dataplane-mode: none" label within pods.

NAME                                              READY   STATUS             RESTARTS         AGE
captured-v1-7987bd7db4-rrrs6                      0/1     CrashLoopBackOff   21 (14s ago)     45m
captured-v2-674d878bb-rlwz7                       0/1     CrashLoopBackOff   21 (9s ago)      45m
service-addressed-waypoint-v1-6cbf7b65b7-xgqg8    0/1     CrashLoopBackOff   21 (50s ago)     45m
service-addressed-waypoint-v2-85d78fd549-m26ls    0/1     CrashLoopBackOff   21 (6s ago)      45m
sidecar-v1-8644c8b7fc-pdfp9                       2/2     Running            0                45m
sidecar-v2-9dbbd4d7-sjfp9                         2/2     Running            0                45m
uncaptured-v1-5b8b4dcb7d-x4ln6                    1/1     Running            0                45m
uncaptured-v2-6d96cb477b-jcdjf                    1/1     Running            0                45m
workload-addressed-waypoint-v1-7968cfd7d4-h5bxn   0/1     CrashLoopBackOff   19 (4m41s ago)   45m
workload-addressed-waypoint-v2-64cb85878-n6dmw    0/1     CrashLoopBackOff   21 (2s ago)      45m

One of the failed pods events:

Events:
  Type    Reason                 Age   From                                       Message
  ----    ------                 ----  ----                                       -------
  Normal  Scheduled              66m   default-scheduler                          Successfully assigned echo-2-4894/captured-v2-674d878bb-rlwz7 to ip-10-0-63-201.ec2.internal
  Normal  IPTablesUsageObserved  59m   openshift.io/iptables-deprecation-alerter  This pod appears to have created one or more iptables rules. IPTables is
deprecated and will no longer be available in RHEL 10 and later. You should
consider migrating to another API such as nftables or eBPF. See also
https://access.redhat.com/solutions/6739041Example iptables rule seen in this pod:
-A PREROUTING -j ISTIO_PRERT
  Normal   AddedInterface  66m                  multus   Add eth0 [10.128.2.84/23] from ovn-kubernetes
  Normal   Killing         65m                  kubelet  Container app failed startup probe, will be restarted
  Normal   Created         65m (x2 over 66m)    kubelet  Created container: app
  Normal   Started         65m (x2 over 66m)    kubelet  Started container app
  Warning  Unhealthy       65m (x18 over 66m)   kubelet  Startup probe failed: dial tcp 10.128.2.84:3333: i/o timeout
  Normal   Pulled          21m (x21 over 66m)   kubelet  Container image "image-registry.openshift-image-registry.svc:5000/istio-system/app:istio-testing" already present on machine
  Warning  BackOff         69s (x299 over 65m)  kubelet  Back-off restarting failed container app in pod captured-v2-674d878bb-rlwz7_echo-2-4894(c52eff71-9aba-4c33-a1ab-e8d3aa9c519a)

The "startupProbe" points to a "tcp-health-port", which is defined with port 3333.
The container itself defines the port 3333, during the startup arguments.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    ambient.istio.io/redirection: enabled
    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.128.2.84/23"],"mac_address":"0a:58:0a:80:02:54","gateway_ips":["10.128.2.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.128.2.1"},{"dest":"172.30.0.0/16","nextHop":"10.128.2.1"},{"dest":"169.254.169.5/32","nextHop":"10.128.2.1"},{"dest":"100.64.0.0/16","nextHop":"10.128.2.1"}],"ip_address":"10.128.2.84/23","gateway_ip":"10.128.2.1"}}'
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "10.128.2.84"
          ],
          "mac": "0a:58:0a:80:02:54",
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: restricted-v2
    prometheus.io/port: "15014"
    prometheus.io/scrape: "true"
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2025-04-09T11:59:20Z"
  generateName: captured-v2-674d878bb-
  labels:
    app: captured
    pod-template-hash: 674d878bb
    test.istio.io/class: captured
    version: v2
  name: captured-v2-674d878bb-rlwz7
  namespace: echo-2-4894
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: captured-v2-674d878bb
    uid: 5385f545-65af-4f12-8a7e-783004849582
  resourceVersion: "197479"
  uid: c52eff71-9aba-4c33-a1ab-e8d3aa9c519a
spec:
  containers:
  - args:
    - --metrics=15014
    - --cluster=cluster-0
    - --port=18080
    - --grpc=17070
    - --port=18085
    - --tcp=19090
    - --port=18443
    - --tls=18443
    - --tcp=16060
    - --server-first=16060
    - --tcp=19091
    - --tcp=16061
    - --server-first=16061
    - --port=18081
    - --grpc=17071
    - --port=19443
    - --tls=19443
    - --port=18082
    - --bind-ip=18082
    - --port=18084
    - --bind-localhost=18084
    - --tcp=19092
    - --port=18083
    - --port=18086
    - --port=18087
    - --proxy-protocol=18087
    - --port=8080
    - --port=3333
    - --version=v2
    - --istio-version=
    - --crt=/cert.crt
    - --key=/cert.key
    env:
    - name: INSTANCE_IPS
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIPs
    - name: NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: BIND_FAMILY
    image: image-registry.openshift-image-registry.svc:5000/istio-system/app:istio-testing
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 10
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      tcpSocket:
        port: tcp-health-port
      timeoutSeconds: 1
    name: app
    ports:
    - containerPort: 18080
      protocol: TCP
    - containerPort: 17070
      protocol: TCP
    - containerPort: 18085
      protocol: TCP
    - containerPort: 19090
      protocol: TCP
    - containerPort: 18443
      protocol: TCP
    - containerPort: 16060
      protocol: TCP
    - containerPort: 19091
      protocol: TCP
    - containerPort: 16061
      protocol: TCP
    - containerPort: 18081
      protocol: TCP
    - containerPort: 17071
      protocol: TCP
    - containerPort: 19443
      protocol: TCP
    - containerPort: 18082
      protocol: TCP
    - containerPort: 18084
      protocol: TCP
    - containerPort: 19092
      protocol: TCP
    - containerPort: 18083
      protocol: TCP
    - containerPort: 18086
      protocol: TCP
    - containerPort: 18087
      protocol: TCP
    - containerPort: 8080
      protocol: TCP
    - containerPort: 3333
      name: tcp-health-port
      protocol: TCP
    readinessProbe:
      failureThreshold: 10
      httpGet:
        path: /
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 1
      periodSeconds: 2
      successThreshold: 1
      timeoutSeconds: 1
    resources: {}
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000900000
    startupProbe:
      failureThreshold: 10
      periodSeconds: 1
      successThreshold: 1
      tcpSocket:
        port: tcp-health-port
      timeoutSeconds: 1
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-xqfzr
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: captured-dockercfg-tdvxj
  nodeName: ip-10-0-63-201.ec2.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000900000
    seLinuxOptions:
      level: s0:c30,c15
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: captured
  serviceAccountName: captured
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  topologySpreadConstraints:
  - labelSelector:
      matchLabels:
        app: captured
    maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
  volumes:
  - name: kube-api-access-xqfzr
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2025-04-09T11:59:21Z"
    status: "True"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2025-04-09T11:59:20Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2025-04-09T11:59:20Z"
    message: 'containers with unready status: [app]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2025-04-09T11:59:20Z"
    message: 'containers with unready status: [app]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2025-04-09T11:59:20Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://e2f4e34985755f2465e8b27463e08c4ad0d7cf7b20662d43842c305a263c4ea5
    image: image-registry.openshift-image-registry.svc:5000/istio-system/app:istio-testing
    imageID: image-registry.openshift-image-registry.svc:5000/istio-system/app@sha256:51796092733faeba30645417ef0d45ab1d4ec5457beafa598b03bcbaa4d567e0
    lastState:
      terminated:
        containerID: cri-o://e2f4e34985755f2465e8b27463e08c4ad0d7cf7b20662d43842c305a263c4ea5
        exitCode: 0
        finishedAt: "2025-04-09T13:06:43Z"
        reason: Completed
        startedAt: "2025-04-09T13:06:31Z"
    name: app
    ready: false
    restartCount: 29
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=app pod=captured-v2-674d878bb-rlwz7_echo-2-4894(c52eff71-9aba-4c33-a1ab-e8d3aa9c519a)
        reason: CrashLoopBackOff
  hostIP: 10.0.63.201
  hostIPs:
  - ip: 10.0.63.201
  phase: Running
  podIP: 10.128.2.84
  podIPs:
  - ip: 10.128.2.84
  qosClass: BestEffort
  startTime: "2025-04-09T11:59:20Z"

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates