Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17117

[4.14] Real BM - HPE setup- redfish-virtulamedia deployment is failing "Bootstrap failed to complete: timed out waiting for the condition"

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • Critical
    • No
    • None
    • None
    • Proposed
    • Metal Platform 240
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      OCP 4.14 deployment on real BM (HPE) setup using redfish-virtualmedia - disconnected is constantly failing to complete bootstrapping. From installation log

      time="2023-07-31T17:14:00+03:00" level=debug msg="ironic_deployment.openshift-master-deployment[0]: Creation complete after 3m30s [id=67c63134-d7ef-4175-abd1-62e5759ca0e4]" time="2023-07-31T17:14:09+03:00" level=debug msg="ironic_deployment.openshift-master-deployment[2]: Still creating... [3m40s elapsed]" time="2023-07-31T17:14:09+03:00" level=debug msg="ironic_deployment.openshift-master-deployment[1]: Still creating... [3m40s elapsed]" time="2023-07-31T17:14:19+03:00" level=debug msg="ironic_deployment.openshift-master-deployment[2]: Still creating... [3m50s elapsed]" time="2023-07-31T17:14:19+03:00" level=debug msg="ironic_deployment.openshift-master-deployment[1]: Still creating... [3m50s elapsed]" time="2023-07-31T17:14:29+03:00" level=debug msg="ironic_deployment.openshift-master-deployment[1]: Still creating... [4m0s elapsed]" time="2023-07-31T17:14:29+03:00" level=debug msg="ironic_deployment.openshift-master-deployment[2]: Still creating... [4m0s elapsed]" time="2023-07-31T17:14:30+03:00" level=debug msg="ironic_deployment.openshift-master-deployment[1]: Creation complete after 4m0s [id=de684789-785e-4e22-be7b-c904d8b90d6e]" time="2023-07-31T17:14:30+03:00" level=debug msg="ironic_deployment.openshift-master-deployment[2]: Creation complete after 4m0s [id=e5d2be79-f6b5-4ed7-9de5-f52cf79800df]" time="2023-07-31T17:14:30+03:00" level=debug time="2023-07-31T17:14:30+03:00" level=debug msg="Apply complete! Resources: 6 added, 0 changed, 0 destroyed." time="2023-07-31T17:14:30+03:00" level=debug msg="OpenShift Installer 4.14.0-0.nightly-2023-07-30-234232" time="2023-07-31T17:14:30+03:00" level=debug msg="Built from commit e044bf5e410b27ec745cd72aba5b5f67e47cd14e" time="2023-07-31T17:14:30+03:00" level=info msg="Waiting up to 20m0s (until 5:34PM IDT) for the Kubernetes API at https://api.ocp-edge1.lab.eng.tlv2.redhat.com:6443..." time="2023-07-31T17:14:30+03:00" level=debug msg="Loading Agent Config..." time="2023-07-31T17:14:31+03:00" level=info msg="API v1.27.3+4aaeaec up" time="2023-07-31T17:14:31+03:00" level=debug msg="Loading Install Config..." time="2023-07-31T17:14:31+03:00" level=debug msg="  Loading SSH Key..." time="2023-07-31T17:14:31+03:00" level=debug msg="  Loading Base Domain..." time="2023-07-31T17:14:31+03:00" level=debug msg="    Loading Platform..." time="2023-07-31T17:14:31+03:00" level=debug msg="  Loading Cluster Name..." time="2023-07-31T17:14:31+03:00" level=debug msg="    Loading Base Domain..." time="2023-07-31T17:14:31+03:00" level=debug msg="    Loading Platform..." time="2023-07-31T17:14:31+03:00" level=debug msg="  Loading Networking..." time="2023-07-31T17:14:31+03:00" level=debug msg="    Loading Platform..." time="2023-07-31T17:14:31+03:00" level=debug msg="  Loading Pull Secret..." time="2023-07-31T17:14:31+03:00" level=debug msg="  Loading Platform..." time="2023-07-31T17:14:31+03:00" level=debug msg="Using Install Config loaded from state file" time="2023-07-31T17:14:31+03:00" level=info msg="Waiting up to 1h0m0s (until 6:14PM IDT) for bootstrapping to complete..." time="2023-07-31T18:14:31+03:00" level=debug msg="Fetching Bootstrap SSH Key Pair..." time="2023-07-31T18:14:31+03:00" level=debug msg="Loading Bootstrap SSH Key Pair..." time="2023-07-31T18:14:31+03:00" level=debug msg="Using Bootstrap SSH Key Pair loaded from state file" time="2023-07-31T18:14:31+03:00" level=debug msg="Reusing previously-fetched Bootstrap SSH Key Pair" time="2023-07-31T18:14:31+03:00" level=debug msg="Fetching Install Config..." time="2023-07-31T18:14:31+03:00" level=debug msg="Loading Install Config..." time="2023-07-31T18:14:31+03:00" level=debug msg="  Loading SSH Key..." time="2023-07-31T18:14:31+03:00" level=debug msg="  Loading Base Domain..." time="2023-07-31T18:14:31+03:00" level=debug msg="    Loading Platform..." time="2023-07-31T18:14:31+03:00" level=debug msg="  Loading Cluster Name..." time="2023-07-31T18:14:31+03:00" level=debug msg="    Loading Base Domain..." time="2023-07-31T18:14:31+03:00" level=debug msg="    Loading Platform..." time="2023-07-31T18:14:31+03:00" level=debug msg="  Loading Networking..." time="2023-07-31T18:14:31+03:00" level=debug msg="    Loading Platform..." time="2023-07-31T18:14:31+03:00" level=debug msg="  Loading Pull Secret..." time="2023-07-31T18:14:31+03:00" level=debug msg="  Loading Platform..." time="2023-07-31T18:14:31+03:00" level=debug msg="Using Install Config loaded from state file" time="2023-07-31T18:14:31+03:00" level=debug msg="Reusing previously-fetched Install Config" time="2023-07-31T18:14:31+03:00" level=error msg="Attempted to gather debug logs after installation failure: must provide bootstrap host address" time="2023-07-31T18:14:31+03:00" level=error msg="Bootstrap failed to complete: timed out waiting for the condition" time="2023-07-31T18:14:31+03:00" level=error msg="Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane."

       

      On bootstrap during the installation I see all ironic containers are up and running and stopped when finished their work. Only container that is running and restarting is

      $ sudo podman ps CONTAINER ID  IMAGE                                                                                                                   COMMAND               CREATED         STATUS         PORTS       NAMES 4277ea0e7228  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:89f62fb5941a5807622a796519e4bb3a94808b14c15dbbfe094b1aee3e1f2d6d  start --tear-down...  16 minutes ago  Up 16 minutes              cluster-bootstrap 
      

      The container log http://pastebin.test.redhat.com/1106364

      The state of the cluster:

      $ oc get co
      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                                                                                    
      baremetal                                                                                         
      cloud-controller-manager                                                                          
      cloud-credential                                     True        False         False      84m     
      cluster-autoscaler                                                                                
      config-operator                                                                                   
      console                                                                                           
      control-plane-machine-set                                                                         
      csi-snapshot-controller                                                                           
      dns                                                                                               
      etcd                                                                                              
      image-registry                                                                                    
      ingress                                                                                           
      insights                                                                                          
      kube-apiserver                                                                                    
      kube-controller-manager                                                                           
      kube-scheduler                                                                                    
      kube-storage-version-migrator                                                                     
      machine-api                                                                                       
      machine-approver                                                                                  
      machine-config                                                                                    
      marketplace                                                                                       
      monitoring                                                                                        
      network                                                                                           
      node-tuning                                                                                       
      openshift-apiserver                                                                               
      openshift-controller-manager                                                                      
      openshift-samples                                                                                 
      operator-lifecycle-manager                                                                        
      operator-lifecycle-manager-catalog                                                                
      operator-lifecycle-manager-packageserver                                                          
      service-ca                                                                                        
      storage  
      $ oc get bmh -n openshift-machine-api
      NAME                 STATE   CONSUMER                   ONLINE   ERROR   AGE
      openshift-master-0           ocp-edge1-zwzcn-master-0   true             16h
      openshift-master-1           ocp-edge1-zwzcn-master-1   true             16h
      openshift-master-2           ocp-edge1-zwzcn-master-2   true             16h
      openshift-worker-0                                      true             16h
      openshift-worker-1                                      true             16h
       
      $ oc get machine -n openshift-machine-api
      NAME                       PHASE   TYPE   REGION   ZONE   AGE
      ocp-edge1-zwzcn-master-0                                  16h
      ocp-edge1-zwzcn-master-1                                  16h
      ocp-edge1-zwzcn-master-2                                  16h 

       

      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-07-30-234232 and earlier images as well. We have this failure from the last week

      How reproducible:

      100%

      Steps to Reproduce:

      1. Deploy OCP 4.14 on real bm setup with virtualmedia
      2.
      3.
      

      Actual results:

      see above

      Expected results:

      deployment should pass and cluster should be ready 

      Additional info:

      will provide boostrap log-bundel
      
      Deployment of 4.13 on the same setup passed. Also on VM simulation the deployment is passing
      
      Was running and analizing deploy with baremetal ipv4. Will try with ipv6 and add info here

              rpittau@redhat.com Riccardo Pittau
              lshilin Lubov Shilin
              Lubov Shilin
              None
              Lubov Shilin Lubov Shilin
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: