[OCPBUGS-36250] PowerVS: 4.16 IPI disconnected installation is failing

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.16.0
Component/s: Installer / openshift-installer
Labels:
- triaged

Severity:
Moderate
Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.16.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

4.16 IPI disconnected installation is failing on powerVS. With the following error message:

level=error msg="Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get \"https://api.ocp-disc4-dal10.ocp-disc2-dns.com:6443/apis/config.openshift.io/v1/clusteroperators\": EOF" time="2024-06-26T13:46:17Z" level=error msg="Bootstrap failed to complete: Get \"https://api.ocp-disc4-dal10.ocp-disc2-dns.com:6443/version\": EOF" time="2024-06-26T13:46:17Z" level=error msg="Failed waiting for Kubernetes API. This error usually happens when there is a problem on the bootstrap host that prevents creating a temporary control plane."

Checked the console for the bootstrap node. The below error message was seen:

Unable to pull the release image

Successfully able to pull the mirror image from the local registry into the mirror vm:

# podman pull --authfile ~/pull_secret.json registry.ipi-test.ocp-disc2-dns.com:5000/ocp4/openshift4:4.16.0-ppc64le Trying to pull registry.ipi-test.ocp-disc2-dns.com:5000/ocp4/openshift4:4.16.0-ppc64le... Getting image source signatures Copying blob d92958f33655 done Copying blob 0c2f5411db88 done Copying blob d25dd898f47a done Copying blob a34ed9377771 done Copying blob 5124657e0c35 done Copying config 6244ac50be done Writing manifest to image destination Storing signatures 6244ac50beb63c257743a83287982bec8e0777f2fc91bbda36c308fe0ca65d4d

Also, able to ping the private IP of the mirror VM from the bootstrap node. However, failed to pull the mirrored image from the bootstrap node.

Error logs from the bootstrap:

DEBU[0000] GET https://registry.ipi-test.ocp-disc2-dns.com:5000/v2/ DEBU[0030] Ping https://registry.ipi-test.ocp-disc2-dns.com:5000/v2/ err Get "https://registry.ipi-test.ocp-disc2-dns.com:5000/v2/": dial tcp 10.240.65.33:5000: i/o timeout (&url.Error{Op:"Get", URL:"https://registry.ipi-test.ocp-disc2-dns.com:5000/v2/", Err:(*net.OpError)(0xc0004ed630)}) DEBU[0030] GET https://registry.ipi-test.ocp-disc2-dns.com:5000/v1/_ping DEBU[0060] Ping https://registry.ipi-test.ocp-disc2-dns.com:5000/v1/_ping err Get "https://registry.ipi-test.ocp-disc2-dns.com:5000/v1/_ping": dial tcp 10.240.65.33:5000: i/o timeout (&url.Error{Op:"Get", URL:"https://registry.ipi-test.ocp-disc2-dns.com:5000/v1/_ping", Err:(*net.OpError)(0xc000420690)}) DEBU[0060] Accessing "registry.ipi-test.ocp-disc2-dns.com:5000/ocp4/openshift4:4.16.0-ppc64le" failed: pinging container registry registry.ipi-test.ocp-disc2-dns.com:5000: Get "https://registry.ipi-test.ocp-disc2-dns.com:5000/v2/": dial tcp 10.240.65.33:5000: i/o timeout WARN[0060] Failed, retrying in 1s ... (1/3). Error: initializing source docker://registry.ipi-test.ocp-disc2-dns.com:5000/ocp4/openshift4:4.16.0-ppc64le: pinging container registry registry.ipi-test.ocp-disc2-dns.com:5000: Get "https://registry.ipi-test.ocp-disc2-dns.com:5000/v2/": dial tcp 10.240.65.33:5000: i/o timeout

Added

Version-Release number of selected component (if applicable):

How reproducible:

    Always

Steps to Reproduce:

    1. Deploy 4.16 IPI disconnected cluster with terraform on powerVS
    2. Deployment failed with the above error message.

Actual results:

    Deployment failed with the error:

level=error msg="Bootstrap failed to complete: Get \"https://api.ocp-disc4-dal10.ocp-disc2-dns.com:6443/version\": EOF" time="2024-06-26T13:46:17Z" level=error msg="Failed waiting for Kubernetes API. This error usually happens when there is a problem on the bootstrap host that prevents creating a temporary control plane."

Expected results:

    4.16 IPI disconnected installation completes without an issue.

Additional info:

Prajwal Gawande added a comment - 2024/06/27 12:24 PM - edited

By inbounding TCP traffic on port 5000 in the security group, tried a 4.16 IPI disconnected installation. Deployment failed due to image-registry co is not available.

vel=error msg=Cluster initialization failed because one or more operators are not functioning properly.
level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below,
level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html
level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation
level=error msg=failed to initialize the cluster: Cluster operator image-registry is not available

All the nodes and cos were in a good state, instead image-registry CO was degraded.

image-registry                                       False       True          True       82m     Available: The deployment does not have available replicas.
..

Checked the logs on image-registry pods.

# oc get pods -A | grep -v Running | grep -v Completed
NAMESPACE                                          NAME                                                            READY   STATUS                  RESTARTS
       AGE
openshift-image-registry                           image-registry-685c4579ff-2bh8m                                 0/1     CrashLoopBackOff        23 (6m28s
ago)   80m
openshift-image-registry                           image-registry-7497794875-wjhpc                                 0/1     CrashLoopBackOff        23 (6m16s
ago)   79m

logs:

# oc logs image-registry-685c4579ff-2bh8m -n openshift-image-registry
time="2024-06-27T12:07:55.360826021Z" level=error msg="s3aws: RequestTimeTooSkewed: The difference between the request time and the server's time is too larg
e.\n\tstatus code: 403, request id: 44ddefce-de61-48e2-9d4a-5a18e39b349c, host id: " go.version="go1.21.9 (Red Hat 1.21.9-1.el9_4) X:strictfipsruntime"
time="2024-06-27T12:07:58.293105908Z" level=info msg=response go.version="go1.21.9 (Red Hat 1.21.9-1.el9_4) X:strictfipsruntime" http.request.host="10.131.0.
8:5000" http.request.id=fd01d52d-e888-4ada-b81f-af409fdca90d http.request.method=GET http.request.remoteaddr="10.131.0.2:43840" http.request.uri=/healthz htt
p.request.useragent=kube-probe/1.29 http.response.contenttype=application/json http.response.duration="94.52µs" http.response.status=503 http.response.writte
n=125

Prajwal Gawande added a comment - 2024/06/27 12:24 PM - edited By inbounding TCP traffic on port 5000 in the security group, tried a 4.16 IPI disconnected installation. Deployment failed due to image-registry co is not available. vel=error msg=Cluster initialization failed because one or more operators are not functioning properly. level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below, level=error msg=https: //docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html level=error msg=The 'wait- for install-complete' subcommand can then be used to continue the installation level=error msg=failed to initialize the cluster: Cluster operator image-registry is not available All the nodes and cos were in a good state, instead image-registry CO was degraded . image-registry False True True 82m Available: The deployment does not have available replicas. .. Checked the logs on image-registry pods. # oc get pods -A | grep -v Running | grep -v Completed NAMESPACE NAME READY STATUS RESTARTS AGE openshift-image-registry image-registry-685c4579ff-2bh8m 0/1 CrashLoopBackOff 23 (6m28s ago) 80m openshift-image-registry image-registry-7497794875-wjhpc 0/1 CrashLoopBackOff 23 (6m16s ago) 79m logs: # oc logs image-registry-685c4579ff-2bh8m -n openshift-image-registry time= "2024-06-27T12:07:55.360826021Z" level=error msg="s3aws: RequestTimeTooSkewed: The difference between the request time and the server's time is too larg e.\n\tstatus code: 403, request id: 44ddefce-de61-48e2-9d4a-5a18e39b349c, host id: " go.version=" go1.21.9 (Red Hat 1.21.9-1.el9_4) X:strictfipsruntime" time= "2024-06-27T12:07:58.293105908Z" level=info msg=response go.version= "go1.21.9 (Red Hat 1.21.9-1.el9_4) X:strictfipsruntime" http.request.host="10.131.0. 8:5000 " http.request.id=fd01d52d-e888-4ada-b81f-af409fdca90d http.request.method=GET http.request.remoteaddr=" 10.131.0.2:43840" http.request.uri=/healthz htt p.request.useragent=kube-probe/1.29 http.response.contenttype=application/json http.response.duration= "94.52µs" http.response.status=503 http.response.writte n=125

Assignee:: Michael Turek

Reporter:: Prajwal Gawande

QA Contact:: Sajauddin Mohammad

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/06/27 7:43 AM

Updated:: 2024/10/03 3:57 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

Collapse comment: Prajwal Gawande added a comment - 2024/06/27 12:24 PM, Edited by Prajwal Gawande - 2024/06/27 12:25 PM

Expand comment: Prajwal Gawande added a comment - 2024/06/27 12:24 PM, Edited by Prajwal Gawande - 2024/06/27 12:25 PM

People

Dates