Uploaded image for project: 'Managed Service - Streams'
  1. Managed Service - Streams
  2. MGDSTRM-10152

Ingress replicas seen to stick in Terminating state during scale down

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • None
    • False
    • None
    • False
    • No
    • ---
    • ---

      We noticed when the autoscaling feature was enabled that openshift ingress replicas were sometimes sticking in the Terminating state. The pods stick in this state until the TerminationGracePeriodSeconds expires (1h). It is not clear if this is a new problem, or something that may have gone unnoticed.

      The pattern we see looks like this:

      oc get pods
      NAME READY STATUS RESTARTS AGE
      router-default-695f856d8-dmx7g 1/1 Running 0 158d
      router-default-695f856d8-xlkzg 1/1 Running 0 158d
      router-kas-694dd4695c-44zzh 1/1 Running 0 17m
      router-kas-694dd4695c-4gg7j 1/1 Terminating 0 36m
      router-kas-694dd4695c-bx8wh 1/1 Running 0 36m
      router-kas-6c56f46db4-72klh 1/1 Terminating 0 36m
      router-kas-6c56f46db4-q7kdm 1/1 Terminating 0 36m
      router-kas-us-east-1a-6db4684c88-7zs9q 1/1 Running 0 36m
      router-kas-us-east-1a-8487dc944c-l6jnj 1/1 Terminating 0 37m
      router-kas-us-east-1b-5597f8669c-t8f4s 1/1 Running 0 36m
      router-kas-us-east-1b-f7d89fc4b-47wjs 1/1 Terminating 0 37m
      router-kas-us-east-1c-77bc49c6fd-7prz8 1/1 Running 0 17m
      router-kas-us-east-1c-848d5dfcc7-cn4p8 1/1 Terminating 0 37m

      The log of the terminated pod doesn't show anything interesting.

      oc logs -f router-kas-694dd4695c-4gg7j
      I1121 09:33:34.179282 1 template.go:437] router "msg"="starting router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: 11109e4028b69749d6f842a4da682916e0d91d2f\nversionFromGit: 4.0.0-370-g11109e40\ngitTreeState: clean\nbuildDate: 2022-05-12T09:54:12Z\n"
      I1121 09:33:34.180763 1 metrics.go:156] metrics "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936"
      I1121 09:33:34.185939 1 router.go:191] template "msg"="creating a new template router" "writeDir"="/var/lib/haproxy"
      I1121 09:33:34.185993 1 router.go:273] template "msg"="router will coalesce reloads within an interval of each other" "interval"="1m0s"
      I1121 09:33:34.186280 1 router.go:343] template "msg"="watching for changes" "path"="/etc/pki/tls/private"
      I1121 09:33:34.186355 1 router.go:262] router "msg"="router is including routes in all namespaces"

      RTS and SREP were unable to rsh into the container although it is unclear whether this was a permissions issue or something else. We haven't checked the state of the host process.

      This issue doesn't appear to service impacting.

            Unassigned Unassigned
            keithbwall Keith Wall
            Kafka Fleet Services
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: