Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20174

Greenboot health check logs do not belong to the unit

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • None
    • 4.15.0
    • MicroShift
    • None
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide

      *Cause*: Wrong way of producing logs in greenboot healthcheck.
      *Consequence*: MicroShift health checks produced logs that were not linked to the systemd unit, making them harder to read.
      *Fix*: Change the way logs are produced to link them to the unit.
      *Result*: Bug doesn’t present anymore.
      Show
      *Cause*: Wrong way of producing logs in greenboot healthcheck. *Consequence*: MicroShift health checks produced logs that were not linked to the systemd unit, making them harder to read. *Fix*: Change the way logs are produced to link them to the unit. *Result*: Bug doesn’t present anymore.
    • Bug Fix

      This is a clone of issue OCPBUGS-20037. The following is the description of the original issue:

      Description of problem:

      The greenboot health check script produces logs to for the journal under the greenboot-healthcheck unit. Some checks are performed using background processes, which print their output to stdout/stderr but are not picked up by journald as if they belong to the unit. This results in lost entries when executing journalctl -u greenboot-healthcheck.

      Version-Release number of selected component (if applicable):

      4.14

      How reproducible:

      100%

      Steps to Reproduce:

      1. Boot microshift host.
      2. Trigger a failed greenboot health check for microshift.
      3. Check the journal output for greenboot healthcheck service and see a failure with empty reasons/files.
      

      Actual results:

      Oct 01 17:06:51 edgeniusos01 40_microshift_running_check.sh[1424]: Waiting 300s for 2 pod(s) from the 'kube-system' namespace to be in 'Ready' state
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ======
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: Info: Log file '/var/lib/microshift-backups/prerun_failed.log' does not exist
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ======
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: Failure log in: '/tmp/pod-list.8OaJQ8I3Z8' Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ------ 
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ------ 
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ======
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: Failure log in: '/tmp/pod-events.L9b8QYjl9e'
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ------
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ------
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: FAILURE 
      Oct 01 17:06:57 edgeniusos01 greenboot[1315]: Script '40_microshift_running_check.sh' FAILURE (exit code '1'). Continuing...
      

      Expected results:

      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[5635]: The number of ready pods in the 'kube-system' namespace is greater than the expected '2' count. Terminating...
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ======
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: Info: Log file '/var/lib/microshift-backups/prerun_failed.log' does not exist
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ======
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: Failure log in: '/tmp/pod-list.8OaJQ8I3Z8'
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ------
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[5651]: NAMESPACE                  NAME                                                        READY   STATUS    RESTARTS        AGE     IP              NODE           NOMINATED NODE   READINESS GATES
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[5651]: cert-manager               cert-manager-75d57c8d4b-6vdwh                               1/1     Running   3               8h      10.42.0.8       edgeniusos01   <none>           <none>
      ...
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: Failure log in: '/tmp/pod-events.L9b8QYjl9e'
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ------
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[5652]: NAMESPACE                           LAST SEEN   TYPE      REASON                    OBJECT                                                           MESSAGE
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[5652]: cert-manager                        5s          Warning   NodeNotReady              pod/cert-manager-75d57c8d4b-6vdwh                                Node is not ready
      ...
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: ------
      Oct 01 17:06:57 edgeniusos01 40_microshift_running_check.sh[1424]: FAILURE
      Oct 01 17:06:57 edgeniusos01 greenboot[1315]: Script '40_microshift_running_check.sh' FAILURE (exit code '1'). Continuing...

      Additional info:

       

              pacevedo@redhat.com Pablo Acevedo Montserrat
              openshift-crt-jira-prow OpenShift Prow Bot
              John George John George
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: