Uploaded image for project: 'Product Technical Learning'
  1. Product Technical Learning
  2. PTL-13237

lab start scripts do not finish

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • DO480 - ACM2.4-en-5-20221130
    • DO480
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • en-US (English)

      Please fill in the following information:


      URL:  
      Reporter RHNID: ricardodacosta
      Section Title:                                                                       

      Issue description

      This issue and solution was previously filed using the old Jira instance but here it is again.

      Lab start scripts sometimes do not complete, we all understand the significant amount of resources used in the heat template for this course, along with other factors.

      the lab start scripts depend on wait.sh for the local cluster, and ocp4-mng/wait.sh for the managed cluster.

      In my experience, if the etcd operator is in a degraded state due to disk thresholds not satisfied, OpenShift may still be able to function. It's performance is degraded, not the operator pod count.

      There are ways to mitigate this in future designs for DO480.

      My approach is simple, the wait.sh and ocp4-mng/wait.sh scripts are too conservative. When (not if) a student reports a lab start script taking >10m to complete, I simply advise them to run ssh lab@utility 'echo true > wait.sh && echo true > ocp4-mng/wait.sh

      This approach has unblocked my students, and I have been successful in my last 2 deliveries. No one has taught DO480 more than me. 

      Of course, if there are other issues which arise, I manually look at the operator states, and the MachineConfigPool states, and events in certain operator projects. However, I'm happy to report that my last 2 x DO480 deliveries went well.

      Steps to reproduce:

       

      Workaround:

       

      Expected result:

        1. Screenshot 2023-06-21 at 1.51.56 PM.png
          463 kB
          Ricardo da Costa
        2. Screenshot 2023-06-19 at 1.21.39 PM.png
          941 kB
          Ricardo da Costa

            gls-curriculum-ocp-core@redhat.com PTL - OCP Platform Team
            rdacosta1@redhat.com Ricardo da Costa
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: