-
Story
-
Resolution: Obsolete
-
Undefined
-
None
-
None
-
None
-
False
-
False
-
undefined
-
The purpose of the alarm is to inform administrators that there are
errors writing audit logs which might be an issue for incident response
teams as the audit logs become unreliable at this point. The alarm has a
query that provides appropriate labels to identify the API server
type and instance that's failing.
The alert is left vague enough that this will catch any API Server
causing these types of issues; but is still usable via the
aforementioned labels.
The alert will trigger if there are any errors detected (threshold above 0).
However, this happening is not a common occurrence, and would most
likely trigger for the following reasons:
- Node's disk space is full
- Someone actually got into the node as is tampering with the audit logs
to the point that they're un-usable (changing modes or attributes).
PR: https://github.com/openshift/cluster-kube-apiserver-operator/pull/1166