Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.12.0
Component/s: Installer / openshift-installer
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:

4.12.0
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

"Bootstrap failed to complete" and compute machines failed on first-boot

Version-Release number of selected component (if applicable):

$ ./openshift-install version
./openshift-install 4.12.0-0.nightly-2022-09-28-204419
built from commit 9eb0224926982cdd6cae53b872326292133e532d
release image registry.ci.openshift.org/ocp/release@sha256:2c8e617830f84ac1ee1bfcc3581010dec4ae5d9cad7a54271574e8d91ef5ecbc
release architecture amd64

How reproducible:

Always so far, as I'd tried it twice and both the same issue.

Steps to Reproduce:

1. create vpc network, subnets, and a firewall-rule to allow ssh access to the bastion host
2. create the bastion host, with setting a valid service-account and scopes of "https://www.googleapis.com/auth/cloud-platform"
3. scp pull secret to the bastion host
4. ssh to the bastion host (subsequent steps would be on the bastion host, except told explicitly)
5. get "oc", e.g. curl https://mirror2.openshift.com/pub/openshift-v4/clients/ocp/4.9.9/openshift-client-linux-4.9.9.tar.gz -o openshift-client-linux-4.9.9.tar.gz; tar zxvf openshift-client-linux-4.9.9.tar.gz
6. obtain the installation program
7. prepare a valid "install-config.yaml" (as work-around of OCPBUGS-1896)
8. then, please see the attached "create-cluster" for the installation steps/errors

Actual results:

Bootstrap failed, and all compute machines failed the first-boot due to failing on 'GET https://api-int.jiwei-0930-03.qe-shared-vpc.qe.gcp.devcluster.openshift.com:22623/config/worker'.

Expected results:

Installation should succeed.

Additional info:

1. One compute machine serial log: 
[***   ] A start job is running for Ignition (fetch) (20min 7s / no limit)[ 1211.424359] ignition[909]: GET https://api-int.jiwei-0930-03.qe-shared-vpc.qe.gcp.devcluster.openshift.com:22623/config/worker: attempt #245
[ 1211.437213] ignition[909]: GET result: Internal Server Error

2. After explicitly removing bootstrap from load balancers, the compute nodes turned Ready, but some cluster operators cannot turn available (see below).
[cloud-user@jiwei-0930-02-rhel8-mirror ~]$ ./oc get nodes
NAME                                                              STATUS   ROLES                  AGE   VERSION
jiwei-0930-03-rrhmn-master-0.c.openshift-qe-shared-vpc.internal   Ready    control-plane,master   94m   v1.24.0+8c7c967
jiwei-0930-03-rrhmn-master-1.c.openshift-qe-shared-vpc.internal   Ready    control-plane,master   95m   v1.24.0+8c7c967
jiwei-0930-03-rrhmn-master-2.c.openshift-qe-shared-vpc.internal   Ready    control-plane,master   95m   v1.24.0+8c7c967
jiwei-0930-03-rrhmn-worker-a-4b5n4                                Ready    worker                 14m   v1.24.0+8c7c967
jiwei-0930-03-rrhmn-worker-b-bjzkw                                Ready    worker                 14m   v1.24.0+8c7c967
[cloud-user@jiwei-0930-02-rhel8-mirror ~]$ ./oc get clusteroperator | grep -v "True        False         False"
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.12.0-0.nightly-2022-09-28-204419   False       True          True       92m     WellKnownAvailable: The well-known endpoint is not yet available: kube-apiserver oauth endpoint https://10.0.0.6:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance)
console                                    4.12.0-0.nightly-2022-09-28-204419   False       False         True       7m26s   RouteHealthAvailable: console route is not admitted
ingress                                    4.12.0-0.nightly-2022-09-28-204419   True        False         True       13m     The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=Unknown (CanaryRouteNotAdmitted: Canary route is not admitted by the default ingress controller)
kube-controller-manager                    4.12.0-0.nightly-2022-09-28-204419   True        False         True       89m     GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: no such host
monitoring                                                                      False       False         True       76m     Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
[cloud-user@jiwei-0930-02-rhel8-mirror ~]$ 

3. Please see http://virt-openshift-05.lab.eng.nay.redhat.com/jiwei/CORS-2260/ for must-gather and bootstrap logs, and the sample "install-config.yaml".

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

log-bundle-20220930070100.tar.gz
10.65 MB
2022/10/10 7:44 AM
create-cluster
22 kB
2022/09/30 12:20 PM

is related to

CORS-2286 QE Tracker

Closed

Assignee:: Patrick Dillon

Reporter:: Jianli Wei

Need Info From:: None

Contributors:: None

QA Contact:: Jianli Wei

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2022/09/30 12:19 PM

Updated:: 2025/07/29 5:39 AM

Resolved:: 2022/10/20 10:41 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide