Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-2529

4.16 capi install will fail for STS

XMLWordPrintable

    • 3
    • False
    • None
    • False
    • SREP Team Thor Sprint 254
    • Approved

      Version: 

      lixue@Xue-Lis-MacBook-Pro ~ % ocm get cluster 2bhg6m5cr529447ig6628pjhmdbvgl61|jq -r .version
      {
        "kind": "Version",
        "id": "openshift-v4.16.0-0.nightly-2024-05-28-011151-nightly",
        "href": "/api/clusters_mgmt/v1/versions/openshift-v4.16.0-0.nightly-2024-05-28-011151-nightly",
        "raw_id": "4.16.0-0.nightly-2024-05-28-011151",
        "channel_group": "nightly",
        "end_of_life_timestamp": "2025-10-31T00:00:00Z"
      } 

      Create cluster with below command

      % rosa create cluster -c xuelinightly --channel-group nightly  --sts --version 4.16.0-0.nightly-2024-05-28-011151 

      Wait for cluster ready and cluster goes into error status

      Here is the install log

      time="2024-05-28T05:56:44Z" level=debug msg=" > controller=\"awscluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/xuelinightly-kdgr7\" namespace=\"openshift-cluster-api-guests\" name=\"xuelinightly-kdgr7\" reconcileID=\"5ebca7b1-47b5-4bb6-a004-4d0f9b07f9a8\""
      time="2024-05-28T05:57:02Z" level=debug msg="I0528 05:57:02.924575     152 reflector.go:800] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: Watch close - *v1beta1.MachinePool total 9 items received"
      time="2024-05-28T05:57:26Z" level=debug msg="I0528 05:57:26.933474     152 reflector.go:800] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: Watch close - *v1beta1.Machine total 9 items received"
      time="2024-05-28T05:57:34Z" level=debug msg="I0528 05:57:34.934053     152 reflector.go:800] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: Watch close - *v1beta1.Cluster total 7 items received"
      time="2024-05-28T05:57:42Z" level=debug msg="I0528 05:57:42.911748     152 awscluster_controller.go:309] \"Reconciling AWSCluster\" controller=\"awscluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/xuelinightly-kdgr7\" namespace=\"openshift-cluster-api-guests\" name=\"xuelinightly-kdgr7\" reconcileID=\"be0f1f22-56d6-41fd-9d2d-58c43836ba66\" cluster=\"openshift-cluster-api-guests/xuelinightly-kdgr7\""
      time="2024-05-28T05:57:42Z" level=debug msg="I0528 05:57:42.911870     152 network.go:31] \"Reconciling network for cluster\" controller=\"awscluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/xuelinightly-kdgr7\" namespace=\"openshift-cluster-api-guests\" name=\"xuelinightly-kdgr7\" reconcileID=\"be0f1f22-56d6-41fd-9d2d-58c43836ba66\" cluster=\"openshift-cluster-api-guests/xuelinightly-kdgr7\""
      time="2024-05-28T05:57:42Z" level=debug msg="I0528 05:57:42.911888     152 vpc.go:48] \"Reconciling VPC\" controller=\"awscluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/xuelinightly-kdgr7\" namespace=\"openshift-cluster-api-guests\" name=\"xuelinightly-kdgr7\" reconcileID=\"be0f1f22-56d6-41fd-9d2d-58c43836ba66\" cluster=\"openshift-cluster-api-guests/xuelinightly-kdgr7\""
      time="2024-05-28T05:57:43Z" level=debug msg="failed to get the service provider secret: secrets \"xuelinightly-aws-service-provider-secret\" not foundfailed to get the service provider secret: secrets \"xuelinightly-aws-service-provider-secret\" not foundE0528 05:57:43.074649     152 awscluster_controller.go:327] \"failed to reconcile network\" err=<"
      time="2024-05-28T05:57:43Z" level=debug msg="\tfailed to create new managed VPC: failed to create vpc: ProcessProviderExecutionError: error in credential_process"
      time="2024-05-28T05:57:43Z" level=debug msg="\tcaused by: exit status 1"
      time="2024-05-28T05:57:43Z" level=debug msg=" > controller=\"awscluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/xuelinightly-kdgr7\" namespace=\"openshift-cluster-api-guests\" name=\"xuelinightly-kdgr7\" reconcileID=\"be0f1f22-56d6-41fd-9d2d-58c43836ba66\" cluster=\"openshift-cluster-api-guests/xuelinightly-kdgr7\""
      time="2024-05-28T05:57:43Z" level=debug msg="I0528 05:57:43.074727     152 recorder.go:104] \"Failed to create new managed VPC: ProcessProviderExecutionError: error in credential_process\\ncaused by: exit status 1\" logger=\"events\" type=\"Warning\" object={\"kind\":\"AWSCluster\",\"namespace\":\"openshift-cluster-api-guests\",\"name\":\"xuelinightly-kdgr7\",\"uid\":\"ecf9ab14-ec4d-46f9-a59a-532997f31983\",\"apiVersion\":\"infrastructure.cluster.x-k8s.io/v1beta2\",\"resourceVersion\":\"311\"} reason=\"FailedCreateVPC\""
      time="2024-05-28T05:57:43Z" level=debug msg="E0528 05:57:43.075579     152 controller.go:329] \"Reconciler error\" err=<"
      time="2024-05-28T05:57:43Z" level=debug msg="\tfailed to create new managed VPC: failed to create vpc: ProcessProviderExecutionError: error in credential_process"
      time="2024-05-28T05:57:43Z" level=debug msg="\tcaused by: exit status 1"
      time="2024-05-28T05:57:43Z" level=debug msg=" > controller=\"awscluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/xuelinightly-kdgr7\" namespace=\"openshift-cluster-api-guests\" name=\"xuelinightly-kdgr7\" reconcileID=\"be0f1f22-56d6-41fd-9d2d-58c43836ba66\""
      time="2024-05-28T05:58:11Z" level=debug msg="I0528 05:58:11.931009     152 reflector.go:800] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: Watch close - *v1beta2.AWSClusterControllerIdentity total 8 items received"
      time="2024-05-28T05:58:14Z" level=debug msg="I0528 05:58:14.923080     152 reflector.go:377] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: forcing resync"
      time="2024-05-28T05:58:30Z" level=debug msg="I0528 05:58:30.972755     152 reflector.go:800] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: Watch close - *v1beta2.AWSManagedCluster total 0 items received"
      time="2024-05-28T05:58:43Z" level=debug msg="I0528 05:58:43.468148     152 reflector.go:377] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: forcing resync"
      time="2024-05-28T06:00:08Z" level=debug msg="I0528 06:00:08.983486     152 reflector.go:800] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: Watch close - *v1beta2.AWSCluster total 0 items received"
      time="2024-05-28T06:01:42Z" level=debug msg="I0528 06:01:42.712434     152 reflector.go:377] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:105: forcing resync"
      time="2024-05-28T06:02:04Z" level=debug msg="Collecting applied cluster api manifests..."
      time="2024-05-28T06:02:04Z" level=error msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: infrastructure was not ready within 15m0s: timed out waiting for the condition"
      time="2024-05-28T06:02:04Z" level=info msg="Shutting down local Cluster API control plane..."
      time="2024-05-28T06:02:04Z" level=info msg="Stopped controller: Cluster API"
      time="2024-05-28T06:02:04Z" level=warning msg="process cluster-api-provider-aws exited with error: signal: killed"
      time="2024-05-28T06:02:04Z" level=info msg="Stopped controller: aws infrastructure provider"
      time="2024-05-28T06:02:06Z" level=info msg="Local Cluster API system has completed operations"
      time="2024-05-28T06:02:07Z" level=error msg="error after waiting for command completion" error="exit status 4" installID=pv5jd866
      time="2024-05-28T06:02:07Z" level=error msg="error provisioning cluster" error="exit status 4" installID=pv5jd866
      time="2024-05-28T06:02:07Z" level=error msg="error running openshift-install, running deprovision to clean up" error="exit status 4" installID=pv5jd866
      time="2024-05-28T06:02:07Z" level=debug msg="OpenShift Installer v4.16.0"
      time="2024-05-28T06:02:07Z" level=debug msg="Built from commit a5eab0a0fea41a38f81b9b3d939898de6046d197"
      time="2024-05-28T06:02:07Z" level=info msg="Waiting up to 20m0s (until 6:22AM UTC) for the Kubernetes API at https://api.xuelinightly.7s8z.s1.devshift.org:6443..."
      time="2024-05-28T06:02:07Z" level=debug msg="Loading Agent Config..."
      time="2024-05-28T06:02:07Z" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.xuelinightly.7s8z.s1.devshift.org:6443/version\": dial tcp: lookup api.xuelinightly.7s8z.s1.devshift.org on 172.30.0.10:53: no such host"
      time="2024-05-28T06:02:37Z" level=debug msg="Still waiting for the Kubernetes API: Get \"https://api.xuelinightly.7s8z.s1.devshift.org:6443/version\": dial tcp: lookup api.xuelinightly.7s8z.s1.devshift.org on 172.30.0.10:53: no such host"
      time="2024-05-28T06:03:07Z" level=info msg="attempting to gather logs with 'openshift-install gather bootstrap'" installID=pv5jd866
      time="2024-05-28T06:03:07Z" level=info msg="running openshift-install binary" args="[gather bootstrap --key /tmp/ssh-privatekey]" installID=pv5jd866
      time="2024-05-28T06:03:07Z" level=debug msg="OpenShift Installer v4.16.0"
      time="2024-05-28T06:03:07Z" level=debug msg="Built from commit a5eab0a0fea41a38f81b9b3d939898de6046d197"
      time="2024-05-28T06:03:07Z" level=debug msg="Fetching Bootstrap SSH Key Pair..."
      time="2024-05-28T06:03:07Z" level=debug msg="Loading Bootstrap SSH Key Pair..."
      time="2024-05-28T06:03:07Z" level=debug msg="Using Bootstrap SSH Key Pair loaded from state file"
      time="2024-05-28T06:03:07Z" level=debug msg="Reusing previously-fetched Bootstrap SSH Key Pair"
      time="2024-05-28T06:03:07Z" level=debug msg="Fetching Install Config..."
      time="2024-05-28T06:03:07Z" level=debug msg="Loading Install Config..."
      time="2024-05-28T06:03:07Z" level=debug msg="  Loading SSH Key..."
      time="2024-05-28T06:03:07Z" level=debug msg="  Loading Base Domain..."
      time="2024-05-28T06:03:07Z" level=debug msg="    Loading Platform..."
      time="2024-05-28T06:03:07Z" level=debug msg="  Loading Cluster Name..."
      time="2024-05-28T06:03:07Z" level=debug msg="    Loading Base Domain..."
      time="2024-05-28T06:03:07Z" level=debug msg="    Loading Platform..."
      time="2024-05-28T06:03:07Z" level=debug msg="  Loading Pull Secret..."
      time="2024-05-28T06:03:07Z" level=debug msg="  Loading Platform..."
      time="2024-05-28T06:03:07Z" level=debug msg="Using Install Config loaded from state file"
      time="2024-05-28T06:03:07Z" level=debug msg="Reusing previously-fetched Install Config"
      time="2024-05-28T06:03:07Z" level=debug msg="Looking for machine manifests in .clusterapi_output"
      time="2024-05-28T06:03:07Z" level=debug msg="bootstrap manifests found: []"
      time="2024-05-28T06:03:07Z" level=warning msg="Failed to extract host addresses: wrong number of bootstrap manifests found: []. Expected exactly one"
      time="2024-05-28T06:03:07Z" level=fatal msg="must provide bootstrap host address"
      time="2024-05-28T06:03:08Z" level=error msg="error after waiting for command completion" error="exit status 1" installID=pv5jd866
      time="2024-05-28T06:03:08Z" level=error msg="failed to gather logs from bootstrap node" error="exit status 1" installID=pv5jd866
      time="2024-05-28T06:03:08Z" level=warning msg="error fetching logs from bootstrap node" error="exit status 1" installID=pv5jd866
      time="2024-05-28T06:03:08Z" level=info msg="saving installer output" installID=pv5jd866
      time="2024-05-28T06:03:08Z" level=debug msg="installer console log: level=warning msg=imageContentSources is deprecated, please use ImageDigestSource\nlevel=info msg=Credentials loaded from the AWS config using \"ProcessProvider\" provider\nlevel=info msg=Consuming Install Config from target directory\nlevel=info msg=Manifests created in: cluster-api, manifests and openshift\nlevel=warning msg=Found override for release image (registry.ci.openshift.org/ocp/release@sha256:ad0c9c951d604785067934334d6d92555ee783db135c3e3b1ed0185106cc77b7). Please be warned, this is not advised\nlevel=info msg=Consuming Master Machines from target directory\nlevel=info msg=Consuming Worker Machines from target directory\nlevel=info msg=Consuming OpenShift Install (Manifests) from target directory\nlevel=info msg=Consuming User-provided Service Account Signing key from target directory\nlevel=info msg=Consuming Openshift Manifests from target directory\nlevel=info msg=Consuming Common Manifests from target directory\nlevel=info msg=Ignition-Configs created in: . and auth\nlevel=info msg=Consuming Bootstrap Ignition Config from target directory\nlevel=info msg=Consuming Worker Ignition Config from target directory\nlevel=info msg=Consuming Master Ignition Config from target directory\nlevel=info msg=Credentials loaded from the AWS config using \"ProcessProvider\" provider\nlevel=info msg=Skipping quota checks\nlevel=info msg=Creating infrastructure resources...\nlevel=info msg=Reconciling IAM roles for control-plane and compute nodes\nlevel=info msg=Started local control plane with envtest\nlevel=info msg=Stored kubeconfig for envtest in: /output/auth/envtest.kubeconfig\nlevel=info msg=Running process: Cluster API with args [-v=2 --diagnostics-address=0 --health-addr=127.0.0.1:41447 --webhook-port=45447 --webhook-cert-dir=/tmp/envtest-serving-certs-3642014667]\nlevel=info msg=Running process: aws infrastructure provider with args [-v=4 --diagnostics-address=0 --health-addr=127.0.0.1:39231 --webhook-port=46103 --webhook-cert-dir=/tmp/envtest-serving-certs-2076796642 --feature-gates=BootstrapFormatIgnition=true,ExternalResourceGC=true]\nlevel=info msg=Created manifest *v1.Namespace, namespace= name=openshift-cluster-api-guests\nlevel=info msg=Created manifest *v1beta2.AWSClusterControllerIdentity, namespace= name=default\nlevel=info msg=Created manifest *v1beta1.Cluster, namespace=openshift-cluster-api-guests name=xuelinightly-kdgr7\nlevel=info msg=Created manifest *v1beta2.AWSCluster, namespace=openshift-cluster-api-guests name=xuelinightly-kdgr7\nlevel=info msg=Waiting up to 15m0s (until 5:34AM UTC) for network infrastructure to become ready...\nlevel=error msg=failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: infrastructure was not ready within 15m0s: timed out waiting for the condition\nlevel=info msg=Shutting down local Cluster API control plane...\nlevel=info msg=Stopped controller: Cluster API\nlevel=warning msg=process cluster-api-provider-aws exited with error: signal: killed\nlevel=info msg=Stopped controller: aws infrastructure provider\nlevel=info msg=Local Cluster API system has completed operations\nlevel=warning msg=Failed to extract host addresses: wrong number of bootstrap manifests found: []. Expected exactly one\nlevel=fatal msg=must provide bootstrap host address\n" installID=pv5jd866
      time="2024-05-28T06:03:08Z" level=debug msg="no additional log fields found" installID=pv5jd866
      time="2024-05-28T06:03:08Z" level=error msg="failed due to install error" error="exit status 4" installID=pv5jd866
      time="2024-05-28T06:03:08Z" level=fatal msg="runtime error" error="exit status 4" 

            efried.openshift Eric Fried
            xueli@redhat.com Xue Li
            Mingxia Huang Mingxia Huang
            Votes:
            0 Vote for this issue
            Watchers:
            18 Start watching this issue

              Created:
              Updated:
              Resolved: