-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.13.0
-
+
-
Moderate
-
None
-
MCO Sprint 231
-
1
-
False
-
-
MCDReboot alert will now stay latched past 15 minutes and not clear automatically.
-
Bug Fix
Description of problem:
When there is a problem while rebooting a node, a MCDRebootError alarm is risen. This alarm disappears after 15 minutes, even if the machine was not rebooted.
Version-Release number of selected component (if applicable):
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-0.nightly-2022-12-22-120609 True False 26m Cluster version is 4.13.0-0.nightly-2022-12-22-120609
How reproducible:
Always
Steps to Reproduce:
1. Execute these commands in a worker node in order to break the reboot process. $ mount -o remount,rw /usr $ mv /usr/bin/systemd-run /usr/bin/systemd-run2 2. Creat any MC. For example, this one: apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: test-file spec: config: ignition: version: 3.1.0 storage: files: - contents: source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK filesystem: root mode: 0644 path: /etc/test
Actual results:
A MCDRebootError alarm is triggered. But after 15 minutes this alarm disappears.
Expected results:
The alarm should not disappear after 15 minutes. It should remain there until the node is rebooted.
Additional info:
This is the PR that seems to introduce this behavior https://github.com/openshift/machine-config-operator/pull/3406#discussion_r1030481908
- relates to
-
MCO-1 Observability Infrastructure and Enhanced metrics in MCO
- Closed
- links to