Uploaded image for project: 'Managed Service - Streams'
  1. Managed Service - Streams
  2. MGDSTRM-9261

StrimziKafkaStuck alert triggered by running out of space in /tmp

XMLWordPrintable

    • False
    • None
    • False
    • Engineering
    • No
    • Not Required
    • ---
    • ---
    • MK - Sprint 222

      Jose Cueto, 46 min
      , Edited
      Good day, my apologies for using this channel but I couldn't find the best place to ask if a specific operation is safe to do to hopefully stop this StrimziKafkaStuck alert that has been firing since Saturday. 4 times already at the worst times - one at 3am. I'm expecting it to fire again today at least 3 times and during midnight.

      If you follow the lead in this ticket - https://issues.redhat.com/browse/OHSS-14012, it seems that everytime the strimzi cluster operator writes (e.g. a secret) at /tmp it errors with "no space left in device". This behavior happens everytime a kafka pod (unfortunately many of them) is created.

      Can I safely remove some files in the /tmp directory?

      Currently, I have to manually restart the operator because it cannot recover from this error. This requires me to wake up even at 3am for example today. I don't want to ignore it as I don't exactly know the customer impact if I left the operator in that state.

      Here is an example contents of the /tmp directory from the operator that was failing;

      ```
      ~ ☕ jcueto on production mk-0419-204008 default
      ❯ oc exec -it strimzi-cluster-operator.v0.26.0-16-85bf88c4d8-p7pr6 -n redhat-managed-kafka-operator – ls /tmp/
      io.strimzi.operator.cluster.operator.resource.cruisecontrol.CruiseControlApiImpl10501012884845257096ts
      io.strimzi.operator.cluster.operator.resource.cruisecontrol.CruiseControlApiImpl14129878248363830381ts
      io.strimzi.operator.cluster.operator.resource.cruisecontrol.CruiseControlApiImpl14654013933600035812ts
      io.strimzi.operator.cluster.operator.resource.cruisecontrol.CruiseControlApiImpl14923957382486326546ts
      io.strimzi.operator.cluster.operator.resource.cruisecontrol.CruiseControlApiImpl3214568433018195314ts
      io.strimzi.operator.cluster.operator.resource.cruisecontrol.CruiseControlApiImpl6497239315654200846ts
      vertx-cache-a229873e-8b2d-4066-b691-db63217e43a2

      ❯ oc exec -it strimzi-cluster-operator.v0.26.0-16-85bf88c4d8-p7pr6 -n redhat-managed-kafka-operator – df -h /tmp/
      Filesystem      Size  Used Avail Use% Mounted on
      tmpfs           1.0M  120K  904K  12% /tmp

      ```
      I'm only able to asynchronously reply to this message and whenever you have a chance could you please reply a yes or no to my question above?

              Unassigned Unassigned
              sbarker@redhat.com Sam Barker
              Kafka Integrations
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: