Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-16490

Check that all containers are using terminationMessagePolicy: FallbackToLogsOnError

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • Gatekeeper 3.15.0, MCE 2.6.3, ACM 2.11.3
    • ACM Architecture
    • None
    • Check that all containers are using terminationMessagePolicy: FallbackToLogsOnError
    • False
    • None
    • False
    • Not Selected
    • To Do
    • Low

      Epic Goal

      Check that all containers are using terminationMessagePolicy: FallbackToLogsOnError. There are different ways a pod can stop on an OpenShift cluster. One way is that the pod can remain alive but non-functional. Another way is that the pod can crash and become non-functional. In the first case, if the administrator has implemented liveness and readiness checks, OpenShift can stop the pod and either restart it on the same node or a different node in the cluster. For the second case, when the application in the pod stops, it should exit with a code and write suitable log entries to help the administrator diagnose what the issue was that caused the problem.

      Why is this important?

      This is an optional recommendation from Operator Best Practices analysis.  For more info on best practices analysis see the related epic.  I'd like a second opinion on the value of this recommendation for consideration in a future release.

      See https://kubernetes.io/docs/tasks/debug/debug-application/determine-reason-pod-failure/ for some more details.  Have we had trouble capturing why a pod failed?

      Scenarios

      Content in pod status example:

          lastState:
            terminated:
              containerID: cri-o://3a44277dfea349874559ed3553bc4e4f8ee269ec95b6c5c978e3ed622503a4d6
              exitCode: 0
              finishedAt: "2024-12-19T20:10:41Z"
              message: |
                THis is a test
              reason: Completed
              startedAt: "2024-12-19T20:00:41Z"
      

      Acceptance Criteria

      ...

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      1. ...

      Open questions:

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub
        Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub
        Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Doc issue opened with a completed template. Separate doc issue
        opened for any deprecation, removal, or any current known
        issue/troubleshooting removal from the doc, if applicable.

              Unassigned Unassigned
              gparvin-redhat Gus Parvin
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: