Uploaded image for project: 'MicroShift'
  1. MicroShift
  2. USHIFT-1890

On an EC2 machine, greenboot reports Boot Status is RED - Health Check FAILURE!

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      On an EC2 machine with RHEL 9.2 and MicroShift installed by mostly following
      
        https://access.redhat.com/documentation/en-us/red_hat_build_of_microshift/4.14/html-single/installing/index
      
      (only skipping the LVM part), enabled, and rebooted,  and running, I eventually start to get wall messages from greenboot, and the same messages are in journal.
      
      However, those messages don't provide any information about what is the problem or how to fix it.

      Version-Release number of selected component (if applicable):

      microshift-4.14.2-202311091609.p0.gd80d6de.assembly.4.14.2.el9.x86_64
      greenboot-0.15.4-1.el9.x86_64
      

      How reproducible:

      Seems deterministic.

      Steps to Reproduce:

      1. Have a t2.medium EC2 instance with 20 GB of disk and RHEL 9.2 installed, log in to it as root.
      2. subscription-manager register --org ... --activationkey ...
      3. subscription-manager config --rhsm.manage_repos=1
      4. subscription-manager repos --enable rhocp-4.14-for-rhel-9-$(uname -m)-rpms --enable fast-datapath-for-rhel-9-$(uname -m)-rpms
      5. dnf install -y microshift openshift-clients
      6. Get pull secret from https://console.redhat.com/openshift/install/pull-secret and paste it to
         cat > /etc/crio/openshift-pull-secret 
      7. chmod 600 /etc/crio/openshift-pull-secret
      8. systemctl enable microshift
      9. systemctl start microshift
      10. export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig
      11. Wait for oc get all -A to report all pods and deployments as running, ready, and available.
      12. Check that journalctl -l | grep greenboot does not report anything.
      13. Reboot the machine and log back to it.
      14. Run journalctl -l | grep greenboot | sed 's/ip-.*\.internal //'
      15. Wait ten second.
      16. See what is on the terminal.
      17. Run again journalctl -l | grep greenboot | sed 's/ip-.*\.internal //'

      Actual results:

      The first journalctl -l | grep greenboot | sed 's/ip-.*\.internal //' after reboot:
      
      Nov 17 12:33:38 systemd[1]: Starting greenboot Health Checks Runner...
      Nov 17 12:33:38 greenboot[638]: Running Required Health Check Scripts...
      Nov 17 12:33:38 00_required_scripts_start.sh[649]: Running greenboot Required Health Check Scripts
      Nov 17 12:33:38 greenboot[638]: Script '00_required_scripts_start.sh' SUCCESS
      Nov 17 12:33:38 greenboot[638]: Running Wanted Health Check Scripts...
      Nov 17 12:33:38 00_wanted_scripts_start.sh[657]: Running greenboot Wanted Health Check Scripts
      Nov 17 12:33:38 greenboot[638]: Script '00_wanted_scripts_start.sh' SUCCESS
      Nov 17 12:33:38 greenboot[638]: Running Required Health Check Scripts...
      
      The messages on the terminal
      Broadcast message from systemd-journald@ip-172-31-84-217.ec2.internal (Fri 2023-11-17 12:39:45 UTC):
      greenboot[638]: Script '40_microshift_running_check.sh' FAILURE (exit code '1'). Continuing...
      Broadcast message from systemd-journald@ip-172-31-84-217.ec2.internal (Fri 2023-11-17 12:39:45 UTC):
      greenboot[9294]: Boot Status is RED - Health Check FAILURE!
      Broadcast message from systemd-journald@ip-172-31-84-217.ec2.internal (Fri 2023-11-17 12:39:45 UTC):
      redboot-auto-reboot[9315]: SYSTEM is UNHEALTHY, but boot_counter is unset in grubenv. Manual intervention necessary.
      Message from syslogd@ip-172-31-84-217 at Nov 17 12:39:45 ...
      greenboot[638]:Script '40_microshift_running_check.sh' FAILURE (exit code '1'). Continuing...
      Message from syslogd@ip-172-31-84-217 at Nov 17 12:39:45 ...
      greenboot[9294]:Boot Status is RED - Health Check FAILURE!
      Message from syslogd@ip-172-31-84-217 at Nov 17 12:39:45 ...
      redboot-auto-reboot[9315]:SYSTEM is UNHEALTHY, but boot_counter is unset in grubenv. Manual intervention necessary.
      
      The second journalctl -l | grep greenboot | sed 's/ip-.*\.internal //' adds
      Nov 17 12:39:45 greenboot[638]: Script '40_microshift_running_check.sh' FAILURE (exit code '1'). Continuing...
      Nov 17 12:39:45 systemd[1]: greenboot-healthcheck.service: Main process exited, code=exited, status=1/FAILURE
      Nov 17 12:39:45 systemd[1]: greenboot-healthcheck.service: Failed with result 'exit-code'.
      Nov 17 12:39:45 systemd[1]: Failed to start greenboot Health Checks Runner.
      Nov 17 12:39:45 systemd[1]: Dependency failed for greenboot Success Scripts Runner.
      Nov 17 12:39:45 systemd[1]: greenboot-task-runner.service: Job greenboot-task-runner.service/start failed with result 'dependency'.
      Nov 17 12:39:45 systemd[1]: greenboot-grub2-set-success.service: Job greenboot-grub2-set-success.service/start failed with result 'dependency'.
      Nov 17 12:39:45 systemd[1]: greenboot-healthcheck.service: Triggering OnFailure= dependencies.
      Nov 17 12:39:45 systemd[1]: greenboot-healthcheck.service: Consumed 48.700s CPU time.
      Nov 17 12:39:45 systemd[1]: Starting greenboot Failure Scripts Runner...
      Nov 17 12:39:45 greenboot[9294]: Boot Status is RED - Health Check FAILURE!
      Nov 17 12:39:45 greenboot[9294]: Running Red Scripts...
      Nov 17 12:39:45 greenboot[9294]: Script '40_microshift_pre_rollback.sh' SUCCESS
      Nov 17 12:39:45 systemd[1]: Finished greenboot Failure Scripts Runner.
      Nov 17 12:39:45 systemd[1]: Starting greenboot MotD Generator...
      Nov 17 12:39:45 greenboot-status[9325]: Script '40_microshift_running_check.sh' FAILURE (exit code '1'). Continuing...
      Nov 17 12:39:45 greenboot-status[9325]: Boot Status is RED - Health Check FAILURE!
      Nov 17 12:39:45 greenboot-status[9325]: SYSTEM is UNHEALTHY, but boot_counter is unset in grubenv. Manual intervention necessary.
      Nov 17 12:39:45 systemd[1]: Finished greenboot MotD Generator.

      Expected results:

      No error, or clear information what is wrong about
      
      boot_counter is unset in grubenv.
      
      and what type of Manual intervention necessary is desired.

      Additional info:

       

       

              eslutsky Evgeny Slutsky
              rhn-engineering-jpazdziora Jan Pazdziora (Inactive)
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: