Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36250

PowerVS: 4.16 IPI disconnected installation is failing

    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

       

      4.16 IPI disconnected installation is failing on powerVS. With the following error message:

      level=error msg="Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get \"https://api.ocp-disc4-dal10.ocp-disc2-dns.com:6443/apis/config.openshift.io/v1/clusteroperators\": EOF" time="2024-06-26T13:46:17Z" level=error msg="Bootstrap failed to complete: Get \"https://api.ocp-disc4-dal10.ocp-disc2-dns.com:6443/version\": EOF" time="2024-06-26T13:46:17Z" level=error msg="Failed waiting for Kubernetes API. This error usually happens when there is a problem on the bootstrap host that prevents creating a temporary control plane."

      Checked the console for the bootstrap node. The below error message was seen:

      Unable to pull the release image

      Successfully able to pull the mirror image from the local registry into the mirror vm:

      # podman pull --authfile ~/pull_secret.json registry.ipi-test.ocp-disc2-dns.com:5000/ocp4/openshift4:4.16.0-ppc64le Trying to pull registry.ipi-test.ocp-disc2-dns.com:5000/ocp4/openshift4:4.16.0-ppc64le... Getting image source signatures Copying blob d92958f33655 done Copying blob 0c2f5411db88 done Copying blob d25dd898f47a done Copying blob a34ed9377771 done Copying blob 5124657e0c35 done Copying config 6244ac50be done Writing manifest to image destination Storing signatures 6244ac50beb63c257743a83287982bec8e0777f2fc91bbda36c308fe0ca65d4d

      Also, able to ping the private IP of the mirror VM from the bootstrap node. However, failed to pull the mirrored image from the bootstrap node. 

      Error logs from the bootstrap:

      DEBU[0000] GET https://registry.ipi-test.ocp-disc2-dns.com:5000/v2/ DEBU[0030] Ping https://registry.ipi-test.ocp-disc2-dns.com:5000/v2/ err Get "https://registry.ipi-test.ocp-disc2-dns.com:5000/v2/": dial tcp 10.240.65.33:5000: i/o timeout (&url.Error{Op:"Get", URL:"https://registry.ipi-test.ocp-disc2-dns.com:5000/v2/", Err:(*net.OpError)(0xc0004ed630)}) DEBU[0030] GET https://registry.ipi-test.ocp-disc2-dns.com:5000/v1/_ping DEBU[0060] Ping https://registry.ipi-test.ocp-disc2-dns.com:5000/v1/_ping err Get "https://registry.ipi-test.ocp-disc2-dns.com:5000/v1/_ping": dial tcp 10.240.65.33:5000: i/o timeout (&url.Error{Op:"Get", URL:"https://registry.ipi-test.ocp-disc2-dns.com:5000/v1/_ping", Err:(*net.OpError)(0xc000420690)}) DEBU[0060] Accessing "registry.ipi-test.ocp-disc2-dns.com:5000/ocp4/openshift4:4.16.0-ppc64le" failed: pinging container registry registry.ipi-test.ocp-disc2-dns.com:5000: Get "https://registry.ipi-test.ocp-disc2-dns.com:5000/v2/": dial tcp 10.240.65.33:5000: i/o timeout WARN[0060] Failed, retrying in 1s ... (1/3). Error: initializing source docker://registry.ipi-test.ocp-disc2-dns.com:5000/ocp4/openshift4:4.16.0-ppc64le: pinging container registry registry.ipi-test.ocp-disc2-dns.com:5000: Get "https://registry.ipi-test.ocp-disc2-dns.com:5000/v2/": dial tcp 10.240.65.33:5000: i/o timeout
      

      Added

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          Always

      Steps to Reproduce:

          1. Deploy 4.16 IPI disconnected cluster with terraform on powerVS
          2. Deployment failed with the above error message.
          

      Actual results:

          Deployment failed with the error:
      level=error msg="Bootstrap failed to complete: Get \"https://api.ocp-disc4-dal10.ocp-disc2-dns.com:6443/version\": EOF" time="2024-06-26T13:46:17Z" level=error msg="Failed waiting for Kubernetes API. This error usually happens when there is a problem on the bootstrap host that prevents creating a temporary control plane." 

      Expected results:

          4.16 IPI disconnected installation completes without an issue.

      Additional info:

          

            [OCPBUGS-36250] PowerVS: 4.16 IPI disconnected installation is failing

            Prajwal Gawande added a comment - - edited

            By inbounding TCP traffic on port 5000 in the security group, tried a 4.16 IPI disconnected installation. Deployment failed due to image-registry co is not available.

            vel=error msg=Cluster initialization failed because one or more operators are not functioning properly.
            level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below,
            level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html
            level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation
            level=error msg=failed to initialize the cluster: Cluster operator image-registry is not available

            All the nodes and cos were in a good state, instead image-registry CO was degraded.

            image-registry                                       False       True          True       82m     Available: The deployment does not have available replicas.
            ..

            Checked the logs on image-registry pods.

            # oc get pods -A | grep -v Running | grep -v Completed
            NAMESPACE                                          NAME                                                            READY   STATUS                  RESTARTS
                   AGE
            openshift-image-registry                           image-registry-685c4579ff-2bh8m                                 0/1     CrashLoopBackOff        23 (6m28s
            ago)   80m
            openshift-image-registry                           image-registry-7497794875-wjhpc                                 0/1     CrashLoopBackOff        23 (6m16s
            ago)   79m

            logs:

            # oc logs image-registry-685c4579ff-2bh8m -n openshift-image-registry
            time="2024-06-27T12:07:55.360826021Z" level=error msg="s3aws: RequestTimeTooSkewed: The difference between the request time and the server's time is too larg
            e.\n\tstatus code: 403, request id: 44ddefce-de61-48e2-9d4a-5a18e39b349c, host id: " go.version="go1.21.9 (Red Hat 1.21.9-1.el9_4) X:strictfipsruntime"
            time="2024-06-27T12:07:58.293105908Z" level=info msg=response go.version="go1.21.9 (Red Hat 1.21.9-1.el9_4) X:strictfipsruntime" http.request.host="10.131.0.
            8:5000" http.request.id=fd01d52d-e888-4ada-b81f-af409fdca90d http.request.method=GET http.request.remoteaddr="10.131.0.2:43840" http.request.uri=/healthz htt
            p.request.useragent=kube-probe/1.29 http.response.contenttype=application/json http.response.duration="94.52µs" http.response.status=503 http.response.writte
            n=125

            Prajwal Gawande added a comment - - edited By inbounding TCP traffic on port 5000 in the security group, tried a 4.16 IPI disconnected installation. Deployment failed due to image-registry co is not available. vel=error msg=Cluster initialization failed because one or more operators are not functioning properly. level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below, level=error msg=https: //docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html level=error msg=The 'wait- for install-complete' subcommand can then be used to continue the installation level=error msg=failed to initialize the cluster: Cluster operator image-registry is not available All the nodes and cos were in a good state, instead image-registry CO was degraded . image-registry                                       False       True          True       82m     Available: The deployment does not have available replicas. .. Checked the logs on image-registry pods. # oc get pods -A | grep -v Running | grep -v Completed NAMESPACE                                          NAME                                                            READY   STATUS                  RESTARTS        AGE openshift-image-registry                           image-registry-685c4579ff-2bh8m                                 0/1     CrashLoopBackOff        23 (6m28s ago)   80m openshift-image-registry                           image-registry-7497794875-wjhpc                                 0/1     CrashLoopBackOff        23 (6m16s ago)   79m logs: # oc logs image-registry-685c4579ff-2bh8m -n openshift-image-registry time= "2024-06-27T12:07:55.360826021Z" level=error msg="s3aws: RequestTimeTooSkewed: The difference between the request time and the server's time is too larg e.\n\tstatus code: 403, request id: 44ddefce-de61-48e2-9d4a-5a18e39b349c, host id: " go.version=" go1.21.9 (Red Hat 1.21.9-1.el9_4) X:strictfipsruntime" time= "2024-06-27T12:07:58.293105908Z" level=info msg=response go.version= "go1.21.9 (Red Hat 1.21.9-1.el9_4) X:strictfipsruntime" http.request.host="10.131.0. 8:5000 " http.request.id=fd01d52d-e888-4ada-b81f-af409fdca90d http.request.method=GET http.request.remoteaddr=" 10.131.0.2:43840" http.request.uri=/healthz htt p.request.useragent=kube-probe/1.29 http.response.contenttype=application/json http.response.duration= "94.52µs" http.response.status=503 http.response.writte n=125

              mturek.coreos Michael Turek
              prgawand Prajwal Gawande
              Sajauddin Mohammad Sajauddin Mohammad
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: