-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
DO480 - ACM2.4-en-5-20221130
-
None
-
False
-
-
False
-
-
-
en-US (English)
Please fill in the following information:
URL: | |
Reporter RHNID: | ricardodacosta |
Section Title: |
Issue description
This issue and solution was previously filed using the old Jira instance but here it is again.
Lab start scripts sometimes do not complete, we all understand the significant amount of resources used in the heat template for this course, along with other factors.
the lab start scripts depend on wait.sh for the local cluster, and ocp4-mng/wait.sh for the managed cluster.
In my experience, if the etcd operator is in a degraded state due to disk thresholds not satisfied, OpenShift may still be able to function. It's performance is degraded, not the operator pod count.
There are ways to mitigate this in future designs for DO480.
My approach is simple, the wait.sh and ocp4-mng/wait.sh scripts are too conservative. When (not if) a student reports a lab start script taking >10m to complete, I simply advise them to run ssh lab@utility 'echo true > wait.sh && echo true > ocp4-mng/wait.sh
This approach has unblocked my students, and I have been successful in my last 2 deliveries. No one has taught DO480 more than me.
Of course, if there are other issues which arise, I manually look at the operator states, and the MachineConfigPool states, and events in certain operator projects. However, I'm happy to report that my last 2 x DO480 deliveries went well.
Steps to reproduce:
Workaround:
Expected result: