XMLWordPrintable

    • Icon: Sub-task Sub-task
    • Resolution: Done
    • Icon: Major Major
    • None
    • 2.9.1 ER1
    • 3scale Operator
    • None
    • False
    • False
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Undefined

      Hi,

      Due to the work made on THREESCALE-5965 a new alert should we added into the operator to get paged when this issues happens:

      apiVersion: monitoring.coreos.com/v1
      kind: PrometheusRule
      metadata:
        creationTimestamp: 2020-09-21T16:30:42Z
        generation: 1
        labels:
          prometheus: application-monitoring
          role: alert-rules
        name: apicast-custom
        namespace: eloy-prom
      spec:
        groups:
        - name: eloy-prom/apicast-custom.rules
          rules:
          - alert: ApicastWorkerRestart
            expr: changes(worker_process{}[5m]) > 0
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "A worker process in Nginx has been started"
              description: "A new thread has been started that means that maybe any worker process died because of any memory limit. Please investigate the memory pressure on pod (instance {{ $labels.instance }})"
      

      Trigger:

      To test it:

      oc exec -ti APICAST-POD -- kill -9 $(ps -axu | grep worker | grep -v grep | awk '{print $2}')
      

      I do not know if this should be added into APICast-operator.

      Alarm is warning because some traffic was lost, but system recovers and worker has been started.
      Regards

            Unassigned Unassigned
            eloycoto Eloy Coto (Inactive)
            Matej Dujava Matej Dujava
            Miguel Soriano Miguel Soriano
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: