Loading...

Type: Bug
Resolution: Not a Bug
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None

Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Description of problem:

On an EC2 machine with RHEL 9.2 and MicroShift installed by mostly following

  https://access.redhat.com/documentation/en-us/red_hat_build_of_microshift/4.14/html-single/installing/index

(only skipping the LVM part), enabled, and rebooted,  and running, I eventually start to get wall messages from greenboot, and the same messages are in journal.

However, those messages don't provide any information about what is the problem or how to fix it.

Version-Release number of selected component (if applicable):

microshift-4.14.2-202311091609.p0.gd80d6de.assembly.4.14.2.el9.x86_64
greenboot-0.15.4-1.el9.x86_64

How reproducible:

Seems deterministic.

Steps to Reproduce:

1. Have a t2.medium EC2 instance with 20 GB of disk and RHEL 9.2 installed, log in to it as root.
2. subscription-manager register --org ... --activationkey ...
3. subscription-manager config --rhsm.manage_repos=1
4. subscription-manager repos --enable rhocp-4.14-for-rhel-9-$(uname -m)-rpms --enable fast-datapath-for-rhel-9-$(uname -m)-rpms
5. dnf install -y microshift openshift-clients
6. Get pull secret from https://console.redhat.com/openshift/install/pull-secret and paste it to
   cat > /etc/crio/openshift-pull-secret 
7. chmod 600 /etc/crio/openshift-pull-secret
8. systemctl enable microshift
9. systemctl start microshift
10. export KUBECONFIG=/var/lib/microshift/resources/kubeadmin/kubeconfig
11. Wait for oc get all -A to report all pods and deployments as running, ready, and available.
12. Check that journalctl -l | grep greenboot does not report anything.
13. Reboot the machine and log back to it.
14. Run journalctl -l | grep greenboot | sed 's/ip-.*\.internal //'
15. Wait ten second.
16. See what is on the terminal.
17. Run again journalctl -l | grep greenboot | sed 's/ip-.*\.internal //'

Actual results:

The first journalctl -l | grep greenboot | sed 's/ip-.*\.internal //' after reboot:

Nov 17 12:33:38 systemd[1]: Starting greenboot Health Checks Runner...
Nov 17 12:33:38 greenboot[638]: Running Required Health Check Scripts...
Nov 17 12:33:38 00_required_scripts_start.sh[649]: Running greenboot Required Health Check Scripts
Nov 17 12:33:38 greenboot[638]: Script '00_required_scripts_start.sh' SUCCESS
Nov 17 12:33:38 greenboot[638]: Running Wanted Health Check Scripts...
Nov 17 12:33:38 00_wanted_scripts_start.sh[657]: Running greenboot Wanted Health Check Scripts
Nov 17 12:33:38 greenboot[638]: Script '00_wanted_scripts_start.sh' SUCCESS
Nov 17 12:33:38 greenboot[638]: Running Required Health Check Scripts...

The messages on the terminal
Broadcast message from systemd-journald@ip-172-31-84-217.ec2.internal (Fri 2023-11-17 12:39:45 UTC):
greenboot[638]: Script '40_microshift_running_check.sh' FAILURE (exit code '1'). Continuing...
Broadcast message from systemd-journald@ip-172-31-84-217.ec2.internal (Fri 2023-11-17 12:39:45 UTC):
greenboot[9294]: Boot Status is RED - Health Check FAILURE!
Broadcast message from systemd-journald@ip-172-31-84-217.ec2.internal (Fri 2023-11-17 12:39:45 UTC):
redboot-auto-reboot[9315]: SYSTEM is UNHEALTHY, but boot_counter is unset in grubenv. Manual intervention necessary.
Message from syslogd@ip-172-31-84-217 at Nov 17 12:39:45 ...
greenboot[638]:Script '40_microshift_running_check.sh' FAILURE (exit code '1'). Continuing...
Message from syslogd@ip-172-31-84-217 at Nov 17 12:39:45 ...
greenboot[9294]:Boot Status is RED - Health Check FAILURE!
Message from syslogd@ip-172-31-84-217 at Nov 17 12:39:45 ...
redboot-auto-reboot[9315]:SYSTEM is UNHEALTHY, but boot_counter is unset in grubenv. Manual intervention necessary.

The second journalctl -l | grep greenboot | sed 's/ip-.*\.internal //' adds
Nov 17 12:39:45 greenboot[638]: Script '40_microshift_running_check.sh' FAILURE (exit code '1'). Continuing...
Nov 17 12:39:45 systemd[1]: greenboot-healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 12:39:45 systemd[1]: greenboot-healthcheck.service: Failed with result 'exit-code'.
Nov 17 12:39:45 systemd[1]: Failed to start greenboot Health Checks Runner.
Nov 17 12:39:45 systemd[1]: Dependency failed for greenboot Success Scripts Runner.
Nov 17 12:39:45 systemd[1]: greenboot-task-runner.service: Job greenboot-task-runner.service/start failed with result 'dependency'.
Nov 17 12:39:45 systemd[1]: greenboot-grub2-set-success.service: Job greenboot-grub2-set-success.service/start failed with result 'dependency'.
Nov 17 12:39:45 systemd[1]: greenboot-healthcheck.service: Triggering OnFailure= dependencies.
Nov 17 12:39:45 systemd[1]: greenboot-healthcheck.service: Consumed 48.700s CPU time.
Nov 17 12:39:45 systemd[1]: Starting greenboot Failure Scripts Runner...
Nov 17 12:39:45 greenboot[9294]: Boot Status is RED - Health Check FAILURE!
Nov 17 12:39:45 greenboot[9294]: Running Red Scripts...
Nov 17 12:39:45 greenboot[9294]: Script '40_microshift_pre_rollback.sh' SUCCESS
Nov 17 12:39:45 systemd[1]: Finished greenboot Failure Scripts Runner.
Nov 17 12:39:45 systemd[1]: Starting greenboot MotD Generator...
Nov 17 12:39:45 greenboot-status[9325]: Script '40_microshift_running_check.sh' FAILURE (exit code '1'). Continuing...
Nov 17 12:39:45 greenboot-status[9325]: Boot Status is RED - Health Check FAILURE!
Nov 17 12:39:45 greenboot-status[9325]: SYSTEM is UNHEALTHY, but boot_counter is unset in grubenv. Manual intervention necessary.
Nov 17 12:39:45 systemd[1]: Finished greenboot MotD Generator.

Expected results:

No error, or clear information what is wrong about

boot_counter is unset in grubenv.

and what type of Manual intervention necessary is desired.

Additional info:

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

sosreport-ip-172-31-95-12-2023-11-20-lkpgfdr.tar.xz
8.64 MB
2023/11/20 10:45 AM
image-2023-11-19-15-16-23-321.png
100 kB
2023/11/19 1:16 PM

is related to

USHIFT-1667 Port MicroShift health check to Go

Closed

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates