Uploaded image for project: 'OpenShift Service Mesh'
  1. OpenShift Service Mesh
  2. OSSM-9053

Pod healthchecks not working in ambient mode with OVN-K CNI

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • OSSM 3.0.0
    • Sail Operator
    • None

      In an Ambient deployment on OpenShift with OVN-K CNI, we found that pods configured with health checks (i.e., livenessprobe, readinessprobe) remained in a CrashLoopBackOff state, even though the underlying applications were running correctly. The kubelet health checks from the host  to the Pod consistently timed out. This issue is easily reproducible on an OCP 4.18 cluster, but seems to be applicable to prior releases of OCP as well (i.e., OCP 4.16) - https://issues.redhat.com/browse/OSSM-9340

       

      ```

      Events:
        Type     Reason          Age               From               Message
        ----     ------          ----              ----               -------
        Normal   Scheduled       25s               default-scheduler  Successfully assigned sleep/http-probe-pod to nftables-qh47z-worker-0-tdw7s
        Normal   AddedInterface  24s               multus             Add eth0 [10.131.0.64/23] from ovn-kubernetes
        Normal   Pulling         24s               kubelet            Pulling image "nginx:latest"
        Normal   Pulled          20s               kubelet            Successfully pulled image "nginx:latest" in 4.407s (4.407s including waiting). Image size: 196159380 bytes.
        Normal   Created         20s               kubelet            Created container: web-container
        Normal   Started         20s               kubelet            Started container web-container
        Warning  Unhealthy       5s                kubelet            Readiness probe failed: Get "http://10.131.0.64:80/index.html": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        Warning  Unhealthy       3s (x2 over 13s)  kubelet            Liveness probe failed: Get "http://10.131.0.64:80/index.html": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

      ```

              sgaddam@redhat.com Gaddam Sridhar
              sgaddam@redhat.com Gaddam Sridhar
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: