-
Bug
-
Resolution: Done
-
Undefined
-
None
-
OSSM 3.0.0
-
None
-
False
-
-
False
-
-
This bug is similar to the following bug, but in another OCP version:
https://issues.redhat.com/browse/OSSM-9053
The Openshift custer configured with ambient mode.
The ambient mode configuration exists on the namespace level.
"istio.io/dataplane-mode: ambient"
The pods stuck in the CrashLoopBackOff state, because unable to pass the health check.
Note:
In OCP 4.16 this issue happens when the ambient mode installed second time.
When executing, for example e2e testing for the first time after cluster fresh deployment, everything is working and ambient mode pods are able to start.
But when I'm executing the same exct e2e flow second, third, etc.. time, I'm facing this issue.
Overall pods status.
The running pods are running with "istio.io/dataplane-mode: none" label within pods.
NAME READY STATUS RESTARTS AGE captured-v1-7987bd7db4-rrrs6 0/1 CrashLoopBackOff 21 (14s ago) 45m captured-v2-674d878bb-rlwz7 0/1 CrashLoopBackOff 21 (9s ago) 45m service-addressed-waypoint-v1-6cbf7b65b7-xgqg8 0/1 CrashLoopBackOff 21 (50s ago) 45m service-addressed-waypoint-v2-85d78fd549-m26ls 0/1 CrashLoopBackOff 21 (6s ago) 45m sidecar-v1-8644c8b7fc-pdfp9 2/2 Running 0 45m sidecar-v2-9dbbd4d7-sjfp9 2/2 Running 0 45m uncaptured-v1-5b8b4dcb7d-x4ln6 1/1 Running 0 45m uncaptured-v2-6d96cb477b-jcdjf 1/1 Running 0 45m workload-addressed-waypoint-v1-7968cfd7d4-h5bxn 0/1 CrashLoopBackOff 19 (4m41s ago) 45m workload-addressed-waypoint-v2-64cb85878-n6dmw 0/1 CrashLoopBackOff 21 (2s ago) 45m
One of the failed pods events:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 66m default-scheduler Successfully assigned echo-2-4894/captured-v2-674d878bb-rlwz7 to ip-10-0-63-201.ec2.internal Normal IPTablesUsageObserved 59m openshift.io/iptables-deprecation-alerter This pod appears to have created one or more iptables rules. IPTables is deprecated and will no longer be available in RHEL 10 and later. You should consider migrating to another API such as nftables or eBPF. See also https://access.redhat.com/solutions/6739041Example iptables rule seen in this pod: -A PREROUTING -j ISTIO_PRERT Normal AddedInterface 66m multus Add eth0 [10.128.2.84/23] from ovn-kubernetes Normal Killing 65m kubelet Container app failed startup probe, will be restarted Normal Created 65m (x2 over 66m) kubelet Created container: app Normal Started 65m (x2 over 66m) kubelet Started container app Warning Unhealthy 65m (x18 over 66m) kubelet Startup probe failed: dial tcp 10.128.2.84:3333: i/o timeout Normal Pulled 21m (x21 over 66m) kubelet Container image "image-registry.openshift-image-registry.svc:5000/istio-system/app:istio-testing" already present on machine Warning BackOff 69s (x299 over 65m) kubelet Back-off restarting failed container app in pod captured-v2-674d878bb-rlwz7_echo-2-4894(c52eff71-9aba-4c33-a1ab-e8d3aa9c519a)
The "startupProbe" points to a "tcp-health-port", which is defined with port 3333.
The container itself defines the port 3333, during the startup arguments.
apiVersion: v1 kind: Pod metadata: annotations: ambient.istio.io/redirection: enabled k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.128.2.84/23"],"mac_address":"0a:58:0a:80:02:54","gateway_ips":["10.128.2.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.128.2.1"},{"dest":"172.30.0.0/16","nextHop":"10.128.2.1"},{"dest":"169.254.169.5/32","nextHop":"10.128.2.1"},{"dest":"100.64.0.0/16","nextHop":"10.128.2.1"}],"ip_address":"10.128.2.84/23","gateway_ip":"10.128.2.1"}}' k8s.v1.cni.cncf.io/network-status: |- [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.128.2.84" ], "mac": "0a:58:0a:80:02:54", "default": true, "dns": {} }] openshift.io/scc: restricted-v2 prometheus.io/port: "15014" prometheus.io/scrape: "true" seccomp.security.alpha.kubernetes.io/pod: runtime/default creationTimestamp: "2025-04-09T11:59:20Z" generateName: captured-v2-674d878bb- labels: app: captured pod-template-hash: 674d878bb test.istio.io/class: captured version: v2 name: captured-v2-674d878bb-rlwz7 namespace: echo-2-4894 ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: ReplicaSet name: captured-v2-674d878bb uid: 5385f545-65af-4f12-8a7e-783004849582 resourceVersion: "197479" uid: c52eff71-9aba-4c33-a1ab-e8d3aa9c519a spec: containers: - args: - --metrics=15014 - --cluster=cluster-0 - --port=18080 - --grpc=17070 - --port=18085 - --tcp=19090 - --port=18443 - --tls=18443 - --tcp=16060 - --server-first=16060 - --tcp=19091 - --tcp=16061 - --server-first=16061 - --port=18081 - --grpc=17071 - --port=19443 - --tls=19443 - --port=18082 - --bind-ip=18082 - --port=18084 - --bind-localhost=18084 - --tcp=19092 - --port=18083 - --port=18086 - --port=18087 - --proxy-protocol=18087 - --port=8080 - --port=3333 - --version=v2 - --istio-version= - --crt=/cert.crt - --key=/cert.key env: - name: INSTANCE_IPS valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIPs - name: NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: BIND_FAMILY image: image-registry.openshift-image-registry.svc:5000/istio-system/app:istio-testing imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 10 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 tcpSocket: port: tcp-health-port timeoutSeconds: 1 name: app ports: - containerPort: 18080 protocol: TCP - containerPort: 17070 protocol: TCP - containerPort: 18085 protocol: TCP - containerPort: 19090 protocol: TCP - containerPort: 18443 protocol: TCP - containerPort: 16060 protocol: TCP - containerPort: 19091 protocol: TCP - containerPort: 16061 protocol: TCP - containerPort: 18081 protocol: TCP - containerPort: 17071 protocol: TCP - containerPort: 19443 protocol: TCP - containerPort: 18082 protocol: TCP - containerPort: 18084 protocol: TCP - containerPort: 19092 protocol: TCP - containerPort: 18083 protocol: TCP - containerPort: 18086 protocol: TCP - containerPort: 18087 protocol: TCP - containerPort: 8080 protocol: TCP - containerPort: 3333 name: tcp-health-port protocol: TCP readinessProbe: failureThreshold: 10 httpGet: path: / port: 8080 scheme: HTTP initialDelaySeconds: 1 periodSeconds: 2 successThreshold: 1 timeoutSeconds: 1 resources: {} securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true runAsUser: 1000900000 startupProbe: failureThreshold: 10 periodSeconds: 1 successThreshold: 1 tcpSocket: port: tcp-health-port timeoutSeconds: 1 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-xqfzr readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true imagePullSecrets: - name: captured-dockercfg-tdvxj nodeName: ip-10-0-63-201.ec2.internal preemptionPolicy: PreemptLowerPriority priority: 0 restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1000900000 seLinuxOptions: level: s0:c30,c15 seccompProfile: type: RuntimeDefault serviceAccount: captured serviceAccountName: captured terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 topologySpreadConstraints: - labelSelector: matchLabels: app: captured maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway volumes: - name: kube-api-access-xqfzr projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace - configMap: items: - key: service-ca.crt path: service-ca.crt name: openshift-service-ca.crt status: conditions: - lastProbeTime: null lastTransitionTime: "2025-04-09T11:59:21Z" status: "True" type: PodReadyToStartContainers - lastProbeTime: null lastTransitionTime: "2025-04-09T11:59:20Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2025-04-09T11:59:20Z" message: 'containers with unready status: [app]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: "2025-04-09T11:59:20Z" message: 'containers with unready status: [app]' reason: ContainersNotReady status: "False" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2025-04-09T11:59:20Z" status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://e2f4e34985755f2465e8b27463e08c4ad0d7cf7b20662d43842c305a263c4ea5 image: image-registry.openshift-image-registry.svc:5000/istio-system/app:istio-testing imageID: image-registry.openshift-image-registry.svc:5000/istio-system/app@sha256:51796092733faeba30645417ef0d45ab1d4ec5457beafa598b03bcbaa4d567e0 lastState: terminated: containerID: cri-o://e2f4e34985755f2465e8b27463e08c4ad0d7cf7b20662d43842c305a263c4ea5 exitCode: 0 finishedAt: "2025-04-09T13:06:43Z" reason: Completed startedAt: "2025-04-09T13:06:31Z" name: app ready: false restartCount: 29 started: false state: waiting: message: back-off 5m0s restarting failed container=app pod=captured-v2-674d878bb-rlwz7_echo-2-4894(c52eff71-9aba-4c33-a1ab-e8d3aa9c519a) reason: CrashLoopBackOff hostIP: 10.0.63.201 hostIPs: - ip: 10.0.63.201 phase: Running podIP: 10.128.2.84 podIPs: - ip: 10.128.2.84 qosClass: BestEffort startTime: "2025-04-09T11:59:20Z"