Resolution: Obsolete
Description of problem:
1. The pre-created service-account for the ingress operator had been granted "roles/compute.networkUser" which includes the permission "compute.firewalls.get", but the operator tells the error. 2. "createFirewallRules" is set as Disabled, so that the installer won't create any firewall-rules, and the mentioned rule "k8s-fw-..." doesn't exist at all.
Version-Release number of selected component (if applicable):
$ openshift-install version openshift-install 4.12.0-0.nightly-2022-10-25-210451 built from commit 14d496fdaec571fa97604a487f5df6a0433c0c68 release image registry.ci.openshift.org/ocp/release@sha256:d6cc07402fee12197ca1a8592b5b781f9f9a84b55883f126d60a3896a36a9b74 release architecture amd64
How reproducible:
Steps to Reproduce:
1. try IPI installation to a shared VPC, with "credentialsMode" being "Manual"
Actual results:
Installation failed, and the ingress operator turned degraded. $ oc get co ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE ingress False True True 49m The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error getting load balancer's firewall: googleapi: Error 403: Required 'compute.firewalls.get' permission for 'projects/openshift-qe-shared-vpc/global/firewalls/k8s-fw-a35d52ba3a1c44a2d9fc8449034eb663', forbidden... $
Expected results:
The installation should succeed, even CCO in manual mode (as told by https://github.com/openshift/openshift-docs/pull/51171).
Additional info:
1. the pre-configured DNS zones in the service project, and the firewall-rules in the host project $ gcloud dns managed-zones list --filter='name=qe1' NAME DNS_NAME DESCRIPTION VISIBILITY qe1 qe1.gcp.devcluster.openshift.com. public $ gcloud dns managed-zones list --filter='name=ipi-xpn-private-zone' NAME DNS_NAME DESCRIPTION VISIBILITY ipi-xpn-private-zone jiwei-1026a.qe1.gcp.devcluster.openshift.com. Preserved private zone for IPI XPN private $ gcloud --project openshift-qe-shared-vpc compute firewall-rules list --filter='network=installer-shared-vpc AND NOT name~ci-op-xpn' 2> /dev/null NAME NETWORK DIRECTION PRIORITY ALLOW DENY DISABLED preserved-ipi-xpn-api installer-shared-vpc INGRESS 1000 tcp:6443,tcp:80,tcp:443 False preserved-ipi-xpn-bastion-access installer-shared-vpc INGRESS 1000 tcp:22,tcp:3128-3129,tcp:5000,tcp:6001-6002,tcp:8080 False preserved-ipi-xpn-control-plane installer-shared-vpc INGRESS 1000 tcp:22623,tcp:10257,tcp:10259 False preserved-ipi-xpn-etcd installer-shared-vpc INGRESS 1000 tcp:2379-2380 False preserved-ipi-xpn-health-checks installer-shared-vpc INGRESS 1000 tcp:6080,tcp:6443,tcp:22624,tcp:30000-32767 False preserved-ipi-xpn-internal-cluster installer-shared-vpc INGRESS 1000 tcp:30000-32767,udp:30000-32767,tcp:9000-9999,udp:9000-9999,udp:4789,udp:6081,udp:500,udp:4500,tcp:10250,esp False preserved-ipi-xpn-internal-network installer-shared-vpc INGRESS 1000 tcp:22,icmp False $ gcloud iam roles describe roles/compute.networkUser | grep compute.firewalls.get - compute.firewalls.get $ 2. the install-config snippet $ yq-3.3.0 r test4/install-config.yaml platform gcp: projectID: openshift-qe region: us-central1 computeSubnet: installer-shared-vpc-subnet-2 controlPlaneSubnet: installer-shared-vpc-subnet-1 createFirewallRules: Disabled publicDNSZone: id: qe1 privateDNSZone: id: ipi-xpn-private-zone network: installer-shared-vpc networkProjectID: openshift-qe-shared-vpc $ yq-3.3.0 r test4/install-config.yaml baseDomain qe1.gcp.devcluster.openshift.com $ yq-3.3.0 r test4/install-config.yaml credentialsMode Manual $ yq-3.3.0 r test4/install-config.yaml compute - architecture: amd64 hyperthreading: Enabled name: worker platform: gcp: tags: - preserved-ipi-xpn-compute replicas: 2 $ yq-3.3.0 r test4/install-config.yaml controlPlane architecture: amd64 hyperthreading: Enabled name: master platform: gcp: tags: - preserved-ipi-xpn-control-plane replicas: 3 $ yq-3.3.0 r test4/install-config.yaml metadata creationTimestamp: null name: jiwei-1026a $ openshift-install create manifests --dir test4 INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" INFO Consuming Install Config from target directory INFO Manifests created in: test4/manifests and test4/openshift $ 3. manually create the required credentials and then copy the manifests to the installation dir $ ./gcp_cco_helper.sh registry.ci.openshift.org/ocp/release:4.12.0-0.nightly-2022-10-25-210451 us-central1 test4 pull_secret.json ...... $ cp cco-manifests/* test4/manifests/ $ ls test4/manifests/ -lrt total 100 -rw-r-----. 1 fedora fedora 4345 Oct 26 12:32 openshift-config-secret-pull-secret.yaml -rw-r-----. 1 fedora fedora 4086 Oct 26 12:32 machine-config-server-tls-secret.yaml -rw-r-----. 1 fedora fedora 1304 Oct 26 12:32 kube-system-configmap-root-ca.yaml -rw-r-----. 1 fedora fedora 118 Oct 26 12:32 kube-cloud-config.yaml -rw-r-----. 1 fedora fedora 200 Oct 26 12:32 cvo-overrides.yaml -rw-r-----. 1 fedora fedora 171 Oct 26 12:32 cluster-scheduler-02-config.yml -rw-r-----. 1 fedora fedora 142 Oct 26 12:32 cluster-proxy-01-config.yaml -rw-r-----. 1 fedora fedora 273 Oct 26 12:32 cluster-network-02-config.yml -rw-r-----. 1 fedora fedora 10135 Oct 26 12:32 cluster-network-01-crd.yml -rw-r-----. 1 fedora fedora 248 Oct 26 12:32 cluster-ingress-02-config.yml -rw-r-----. 1 fedora fedora 644 Oct 26 12:32 cluster-infrastructure-02-config.yml -rw-r-----. 1 fedora fedora 216 Oct 26 12:32 cluster-dns-02-config.yml -rw-r-----. 1 fedora fedora 2314 Oct 26 12:32 cluster-config.yaml -rw-r-----. 1 fedora fedora 545 Oct 26 12:32 cloud-provider-config.yaml -rw-r-----. 1 fedora fedora 175 Oct 26 12:32 cloud-controller-uid-config.yml -rw-rw-r--. 1 fedora fedora 3270 Oct 26 12:48 99_openshift-machine-api_gcp-cloud-credentials-secret.yaml -rw-rw-r--. 1 fedora fedora 3267 Oct 26 12:48 99_openshift-ingress-operator_cloud-credentials-secret.yaml -rw-rw-r--. 1 fedora fedora 3283 Oct 26 12:48 99_openshift-image-registry_installer-cloud-credentials-secret.yaml -rw-rw-r--. 1 fedora fedora 3277 Oct 26 12:48 99_openshift-cluster-csi-drivers_gcp-pd-cloud-credentials-secret.yaml -rw-rw-r--. 1 fedora fedora 3286 Oct 26 12:48 99_openshift-cloud-network-config-controller_cloud-credentials-secret.yaml -rw-rw-r--. 1 fedora fedora 3301 Oct 26 12:48 99_openshift-cloud-credential-operator_cloud-credential-operator-gcp-ro-creds-secret.yaml -rw-rw-r--. 1 fedora fedora 3291 Oct 26 12:48 99_openshift-cloud-controller-manager_gcp-ccm-cloud-credentials-secret.yaml $ 4. try creating cluster, which failed finally, due to ingress operator degraded $ openshift-install create cluster --dir test4 INFO Consuming Worker Machines from target directory INFO Consuming Master Machines from target directory INFO Consuming OpenShift Install (Manifests) from target directory INFO Consuming Openshift Manifests from target directory INFO Consuming Common Manifests from target directory INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" WARNING FeatureSet "TechPreviewNoUpgrade" is enabled. This FeatureSet does not allow upgrades and may affect the supportability of the cluster. INFO Creating infrastructure resources... INFO Waiting up to 20m0s (until 1:11PM) for the Kubernetes API at https://api.jiwei-1026a.qe1.gcp.devcluster.openshift.com:6443... INFO API v1.25.2+4bd0702 up INFO Waiting up to 30m0s (until 1:23PM) for bootstrapping to complete... INFO Destroying the bootstrap resources... INFO Waiting up to 40m0s (until 1:47PM) for the cluster at https://api.jiwei-1026a.qe1.gcp.devcluster.openshift.com:6443 to initialize... ERROR Cluster operator authentication Degraded is True with OAuthServerRouteEndpointAccessibleController_SyncError: OAuthServerRouteEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.jiwei-1026a.qe1.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1026a.qe1.gcp.devcluster.openshift.com on no such host (this is likely result of malfunctioning DNS server) ERROR Cluster operator authentication Available is False with OAuthServerRouteEndpointAccessibleController_EndpointUnavailable: OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1026a.qe1.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1026a.qe1.gcp.devcluster.openshift.com on no such host (this is likely result of malfunctioning DNS server) INFO Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected INFO Cluster operator cloud-controller-manager CloudControllerOwner is True with AsExpected: Cluster Cloud Controller Manager Operator owns cloud controllers at 4.12.0-0.nightly-2022-10-25-210451 INFO Cluster operator cluster-api SecretSyncControllerAvailable is True with AsExpected: User Data Secret Controller works as expected INFO Cluster operator cluster-api SecretSyncControllerDegraded is False with AsExpected: User Data Secret Controller works as expected INFO Cluster operator console Progressing is True with SyncLoopRefresh_InProgress: SyncLoopRefreshProgressing: Working toward version 4.12.0-0.nightly-2022-10-25-210451, 0 replicas available ERROR Cluster operator console Available is False with Deployment_InsufficientReplicas::RouteHealth_FailedGet: DeploymentAvailable: 0 replicas available for console deployment ERROR RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.jiwei-1026a.qe1.gcp.devcluster.openshift.com): Get "https://console-openshift-console.apps.jiwei-1026a.qe1.gcp.devcluster.openshift.com": dial tcp: lookup console-openshift-console.apps.jiwei-1026a.qe1.gcp.devcluster.openshift.com on no such host INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required ERROR Cluster operator ingress Available is False with IngressUnavailable: The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error getting load balancer's firewall: googleapi: Error 403: Required 'compute.firewalls.get' permission for 'projects/openshift-qe-shared-vpc/global/firewalls/k8s-fw-a35d52ba3a1c44a2d9fc8449034eb663', forbidden ERROR The kube-controller-manager logs may contain more details.) INFO Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available. ERROR Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error getting load balancer's firewall: googleapi: Error 403: Required 'compute.firewalls.get' permission for 'projects/openshift-qe-shared-vpc/global/firewalls/k8s-fw-a35d52ba3a1c44a2d9fc8449034eb663', forbidden ERROR The kube-controller-manager logs may contain more details.), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing) INFO Cluster operator ingress EvaluationConditionsDetected is False with AsExpected: INFO Cluster operator insights ClusterTransferAvailable is False with NoClusterTransfer: no available cluster transfer INFO Cluster operator insights Disabled is False with AsExpected: INFO Cluster operator insights SCAAvailable is True with Updated: SCA certs successfully updated in the etc-pki-entitlement secret INFO Cluster operator network ManagementStateDegraded is False with : ERROR Cluster initialization failed because one or more operators are not functioning properly. ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation ERROR failed to initialize the cluster: Cluster operators authentication, console, ingress are not available $ $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 57m Unable to apply 4.12.0-0.nightly-2022-10-25-210451: some cluster operators are not available $ oc get nodes NAME STATUS ROLES AGE VERSION jiwei-1026a-sx4ph-master-0.c.openshift-qe.internal Ready control-plane,master 57m v1.25.2+4bd0702 jiwei-1026a-sx4ph-master-1.c.openshift-qe.internal Ready control-plane,master 57m v1.25.2+4bd0702 jiwei-1026a-sx4ph-master-2.c.openshift-qe.internal Ready control-plane,master 55m v1.25.2+4bd0702 jiwei-1026a-sx4ph-worker-a-9xhnn.c.openshift-qe.internal Ready worker 44m v1.25.2+4bd0702 jiwei-1026a-sx4ph-worker-b-ctfw9.c.openshift-qe.internal Ready worker 44m v1.25.2+4bd0702 $ oc get co | grep -v 'True False False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.12.0-0.nightly-2022-10-25-210451 False False True 53m OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1026a.qe1.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1026a.qe1.gcp.devcluster.openshift.com on no such host (this is likely result of malfunctioning DNS server) console 4.12.0-0.nightly-2022-10-25-210451 False True False 43m DeploymentAvailable: 0 replicas available for console deployment... ingress False True True 43m The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error getting load balancer's firewall: googleapi: Error 403: Required 'compute.firewalls.get' permission for 'projects/openshift-qe-shared-vpc/global/firewalls/k8s-fw-a35d52ba3a1c44a2d9fc8449034eb663', forbidden... $ $ oc get pods -n openshift-ingress-operator NAME READY STATUS RESTARTS AGE ingress-operator-84d549fd76-nfr4l 2/2 Running 2 (47m ago) 57m $ oc logs ingress-operator-84d549fd76-nfr4l -n openshift-ingress-operator ...... 2022-10-26T13:53:31.279Z ERROR operator.ingress_controller controller/controller.go:121 got retryable error; requeueing{"after": "1m0s", "error": "IngressController is degraded: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error getting load balancer's firewall: googleapi: Error 403: Required 'compute.firewalls.get' permission for 'projects/openshift-qe-shared-vpc/global/firewalls/k8s-fw-a35d52ba3a1c44a2d9fc8449034eb663', forbidden\nThe kube-controller-manager logs may contain more details.), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"} 2022-10-26T13:53:58.388Z ERROR operator.canary_controller wait/wait.go:157 error performing canary route check {"error": "error sending canary HTTP request: DNS error: Get \"https://canary-openshift-ingress-canary.apps.jiwei-1026a.qe1.gcp.devcluster.openshift.com\": dial tcp: lookup canary-openshift-ingress-canary.apps.jiwei-1026a.qe1.gcp.devcluster.openshift.com on no such host"} $ $ gcloud --project openshift-qe-shared-vpc compute firewall-rules list --filter='network=installer-shared-vpc AND NOT name~ci-op-xpn' 2> /dev/null NAME NETWORK DIRECTION PRIORITY ALLOW DENY DISABLED preserved-ipi-xpn-api installer-shared-vpc INGRESS 1000 tcp:6443,tcp:80,tcp:443 False preserved-ipi-xpn-bastion-access installer-shared-vpc INGRESS 1000 tcp:22,tcp:3128-3129,tcp:5000,tcp:6001-6002,tcp:8080 False preserved-ipi-xpn-control-plane installer-shared-vpc INGRESS 1000 tcp:22623,tcp:10257,tcp:10259 False preserved-ipi-xpn-etcd installer-shared-vpc INGRESS 1000 tcp:2379-2380 False preserved-ipi-xpn-health-checks installer-shared-vpc INGRESS 1000 tcp:6080,tcp:6443,tcp:22624,tcp:30000-32767 False preserved-ipi-xpn-internal-cluster installer-shared-vpc INGRESS 1000 tcp:30000-32767,udp:30000-32767,tcp:9000-9999,udp:9000-9999,udp:4789,udp:6081,udp:500,udp:4500,tcp:10250,esp False preserved-ipi-xpn-internal-network installer-shared-vpc INGRESS 1000 tcp:22,icmp False $
- is blocked by
OCPBUGS-2966 Set createFirewallRules as Tech Preview
- Closed
- is related to
CORS-2030 QE Tracker
- Closed