• Icon: Sub-task Sub-task
    • Resolution: Done
    • Icon: Major Major
    • None
    • 2.9.1 ER1
    • 3scale Operator
    • None
    • False
    • False
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Undefined

      Hi,

      Due to the work made on THREESCALE-5965 a new alert should we added into the operator to get paged when this issues happens:

      apiVersion: monitoring.coreos.com/v1
      kind: PrometheusRule
      metadata:
        creationTimestamp: 2020-09-21T16:30:42Z
        generation: 1
        labels:
          prometheus: application-monitoring
          role: alert-rules
        name: apicast-custom
        namespace: eloy-prom
      spec:
        groups:
        - name: eloy-prom/apicast-custom.rules
          rules:
          - alert: ApicastWorkerRestart
            expr: changes(worker_process{}[5m]) > 0
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "A worker process in Nginx has been started"
              description: "A new thread has been started that means that maybe any worker process died because of any memory limit. Please investigate the memory pressure on pod (instance {{ $labels.instance }})"
      

      Trigger:

      To test it:

      oc exec -ti APICAST-POD -- kill -9 $(ps -axu | grep worker | grep -v grep | awk '{print $2}')
      

      I do not know if this should be added into APICast-operator.

      Alarm is warning because some traffic was lost, but system recovers and worker has been started.
      Regards

              Unassigned Unassigned
              eloycoto Eloy Coto
              Matej Dujava Matej Dujava
              Miguel Soriano Miguel Soriano
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: