Loading...

XML

Word

Printable

Fluentd can get stuck when rolling all pods we should alert of a fluentd pod stuck and have a SOP for resolving the problem.

<Suggestions for how this may be solved.> [Optional]

Then use that metric to alert for a failed pod kube_pod_container_status_ready{namespace="openshift-logging", container="fluentd"} < 1

Include the following where applicable:

<bulleted list of functional acceptance criteria that need to be completed>
<call out anything on the documentation side that's needed as a result of this task being completed>
<any metrics, monitoring dashboards and alerts that need to be created or be updated>
<SOP creation or updates>

The following steps should be adhered to:

Required tests should be put in place - unit, integration, manual test cases (if necessary)
CI and all relevant tests passing
Changes have been verified by one additional reviewer against:
each required environment
each supported upgrade path
If the changes could have an impact on the clients (either UI or CLI), a JIRA should be created for making the required changes on the client side and acknowledged by one of the client side team members. PR has been merged

links to

Gchat thread with CSSRE