Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20037

Greenboot health check logs do not belong to the unit

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • None
    • 4.15.0
    • MicroShift
    • None
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, the Greenboot health check script printed outputs for some checks that were not picked up by `journald`, resulting in missing log entries when running the `journalctl -u greenboot-healthcheck` command. With this release, the production of logs by the Greenboot healthcheck has been fixed so that all outputs are linked to the `systemd` unit, making them easily available to read.
      Show
      Previously, the Greenboot health check script printed outputs for some checks that were not picked up by `journald`, resulting in missing log entries when running the `journalctl -u greenboot-healthcheck` command. With this release, the production of logs by the Greenboot healthcheck has been fixed so that all outputs are linked to the `systemd` unit, making them easily available to read.
    • Bug Fix

      Description of problem:

      The greenboot health check script produces logs to for the journal under the greenboot-healthcheck unit. Some checks are performed using background processes, which print their output to stdout/stderr but are not picked up by journald as if they belong to the unit. This results in lost entries when executing journalctl -u greenboot-healthcheck.

      Version-Release number of selected component (if applicable):

      4.14

      How reproducible:

      100%

      Steps to Reproduce:

      1. Boot microshift host.
      2. Trigger a failed greenboot health check for microshift.
      3. Check the journal output for greenboot healthcheck service and see a failure with empty reasons/files.
      

      Actual results:

      Oct 01 17:06:51 edgeniusos01 40_microshift_running_check.sh[1424]: Waiting 300s for 2 pod(s) from the 'kube-system' namespace to be in 'Ready' state
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ======
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: Info: Log file '/var/lib/microshift-backups/prerun_failed.log' does not exist
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ======
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: Failure log in: '/tmp/pod-list.8OaJQ8I3Z8' Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ------ 
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ------ 
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ======
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: Failure log in: '/tmp/pod-events.L9b8QYjl9e'
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ------
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ------
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: FAILURE 
      Oct 01 17:06:57 edgeniusos01 greenboot[1315]: Script '40_microshift_running_check.sh' FAILURE (exit code '1'). Continuing...
      

      Expected results:

      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[5635]: The number of ready pods in the 'kube-system' namespace is greater than the expected '2' count. Terminating...
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ======
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: Info: Log file '/var/lib/microshift-backups/prerun_failed.log' does not exist
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ======
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: Failure log in: '/tmp/pod-list.8OaJQ8I3Z8'
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ------
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[5651]: NAMESPACE                  NAME                                                        READY   STATUS    RESTARTS        AGE     IP              NODE           NOMINATED NODE   READINESS GATES
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[5651]: cert-manager               cert-manager-75d57c8d4b-6vdwh                               1/1     Running   3               8h      10.42.0.8       edgeniusos01   <none>           <none>
      ...
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: Failure log in: '/tmp/pod-events.L9b8QYjl9e'
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ------
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[5652]: NAMESPACE                           LAST SEEN   TYPE      REASON                    OBJECT                                                           MESSAGE
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[5652]: cert-manager                        5s          Warning   NodeNotReady              pod/cert-manager-75d57c8d4b-6vdwh                                Node is not ready
      ...
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ------
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: FAILURE
      Oct 01 17:06:57 edgeniusos01 greenboot[1315]: Script '40_microshift_running_check.sh' FAILURE (exit code '1'). Continuing...

      Additional info:

       

            pacevedo@redhat.com Pablo Acevedo Montserrat
            pacevedo@redhat.com Pablo Acevedo Montserrat
            Douglas Hensel Douglas Hensel
            Shauna Diaz Shauna Diaz
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: