-
Sub-task
-
Resolution: Done
-
Major
-
None
-
2.9.1 ER1
-
None
-
False
-
False
-
Not Started
-
Not Started
-
Not Started
-
Not Started
-
Not Started
-
Not Started
-
Undefined
-
Hi,
Due to the work made on THREESCALE-5965 a new alert should we added into the operator to get paged when this issues happens:
apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: creationTimestamp: 2020-09-21T16:30:42Z generation: 1 labels: prometheus: application-monitoring role: alert-rules name: apicast-custom namespace: eloy-prom spec: groups: - name: eloy-prom/apicast-custom.rules rules: - alert: ApicastWorkerRestart expr: changes(worker_process{}[5m]) > 0 for: 5m labels: severity: warning annotations: summary: "A worker process in Nginx has been started" description: "A new thread has been started that means that maybe any worker process died because of any memory limit. Please investigate the memory pressure on pod (instance {{ $labels.instance }})"
To test it:
oc exec -ti APICAST-POD -- kill -9 $(ps -axu | grep worker | grep -v grep | awk '{print $2}')
I do not know if this should be added into APICast-operator.
Alarm is warning because some traffic was lost, but system recovers and worker has been started.
Regards
- relates to
-
THREESCALE-5630 apicast-production container didn't terminate when one of its processes was OOM killed
- Closed