XML

Word

Printable

Type: Epic
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Gatekeeper 3.15.0, MCE 2.6.3, ACM 2.11.3
Component/s: ACM Architecture
Labels:
None

Epic Name:
Check that all containers are using terminationMessagePolicy: FallbackToLogsOnError
Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Epic Status:
To Do

Severity:
Low

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Intelligence Requested:
Market:

Epic Goal

Check that all containers are using terminationMessagePolicy: FallbackToLogsOnError. There are different ways a pod can stop on an OpenShift cluster. One way is that the pod can remain alive but non-functional. Another way is that the pod can crash and become non-functional. In the first case, if the administrator has implemented liveness and readiness checks, OpenShift can stop the pod and either restart it on the same node or a different node in the cluster. For the second case, when the application in the pod stops, it should exit with a code and write suitable log entries to help the administrator diagnose what the issue was that caused the problem.

Why is this important?

This is an optional recommendation from Operator Best Practices analysis. For more info on best practices analysis see the related epic. I'd like a second opinion on the value of this recommendation for consideration in a future release.

See https://kubernetes.io/docs/tasks/debug/debug-application/determine-reason-pod-failure/ for some more details. Have we had trouble capturing why a pod failed?

Scenarios

Content in pod status example:

    lastState:
      terminated:
        containerID: cri-o://3a44277dfea349874559ed3553bc4e4f8ee269ec95b6c5c978e3ed622503a4d6
        exitCode: 0
        finishedAt: "2024-12-19T20:10:41Z"
        message: |
          THis is a test
        reason: Completed
        startedAt: "2024-12-19T20:00:41Z"

Acceptance Criteria

...

Dependencies (internal and external)

Previous Work (Optional):

Open questions:

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub
Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub
Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Doc issue opened with a completed template. Separate doc issue
opened for any deprecation, removal, or any current known
issue/troubleshooting removal from the doc, if applicable.

is related to

ACM-15493 Apply required best practices to the Gatekeeper operator

Review

Assignee:: Unassigned

Reporter:: Gus Parvin

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2024/12/17 2:56 PM

Updated:: 2024/12/30 12:13 AM

Details

Description

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions:

Done Checklist

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates