-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.13.z, 4.12.z, 4.11.z
-
Critical
-
No
-
Rejected
-
False
-
This is a clone of issue OCPBUGS-10728. The following is the description of the original issue:
—
Solution:
PR posted at https://github.com/openshift/installer/pull/7018
The summary of findings is that timeseries returned by
serviceruntime.googleapis.com/quota/allocation/usage can return other project_ids than intended.
With the filters added in the PR, only usages in the intended project will be used to match with limits of the same project.
Previously, it is possible to by chance have limits from intended project, and usage from an incorrect project accessible by the account that can lead to incorrect quota calculation (including potentially negative quota) that prevents an otherwise capable account from launching a cluster.
------
Description of problem:
error(MissingQuota): compute.googleapis.com/networks is not available in global because the required number of resources (1) is more than remaining quota of -negativenumber
During debugging of installer master branch, it is possible for quota calculated for networks service to be negative.
Watching the response, it is shown that the limits and the usage are coming from different projects in the account. If limits/usage were from the account it, the calculation would not have been negative.
LimitFromAccountWithLowerLimit - UsageFromAccountWithBiggerLimits = negativeQuota
The response for compute.googleapis.com/networks that is used for usage when inspected are from a different google cloud project than what is selected and used for limits.
https://github.com/openshift/installer/blob/9b44e67f7a5afb953712a12bc753cbb8fc12c9a4/pkg/quota/gcp/usage.go#L35
The timeseries' label project_id show value "openshift-gce-devel-ci" that is different than what was selected "openshift-gce-devel" for use during installation. See debug screenshots.
Version-Release number of selected component (if applicable):
[installer from source(master) 7667b46 |https://github.com/openshift/installer/commit/7667b46bd13a6091d8dd7791e77235adfa161935] openshift-install unreleased-master-7903-g7667b46bd13a6091d8dd7791e77235adfa161935 built from commit 7667b46bd13a6091d8dd7791e77235adfa161935 release image registry.ci.openshift.org/origin/release:4.13 release architecture amd64
The same issue also occurred on 4.12.z release of the installer, meaning it's not a new regression.
How reproducible:
Once I got it to do once, it can repeatably be reproduced by reusing the same command. However, since I have to get stuff done, I try to get out of it by choosing a different --dir and/or cluster name which would sometimes work to get me out of it.
Steps to Reproduce:
(I can't actually verify but this is the last three things I did when this occured.) 1. openshift-install create cluster --dir clusters/gcp-dev1 --log-level=debug FATAL failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": metadata.name: Invalid value: "tkaovila": record api.tkaovila.gcp.devcluster.openshift.com. already exists in DNS Zone (openshift-gce-devel/devcluster) and might be in use by another cluster, please remove it to continue 2. openshift-install destroy cluster --dir clusters/gcp-dev1 --log-level=debug INFO Time elapsed: 27s INFO Uninstallation complete! 3. openshift-install create cluster --dir clusters/gcp-dev00 --log-level=debug FATAL failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Quota Check": error(MissingQuota): compute.googleapis.com/networks is not available in global because the required number of resources (1) is more than remaining quota of -143
Actual results:
FATAL failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Quota Check": error(MissingQuota): compute.googleapis.com/networks is not available in global because the required number of resources (1) is more than remaining quota of -143
Expected results:
cluster created
Additional info:
Slack thread that link other threads and info
https://redhat-internal.slack.com/archives/C68TNFWA2/p1679452931661929
Image from debugging.
https://drive.google.com/file/d/1O-LRZt5cgglkG-AnXWXhKjytyRyLGvgr/view?usp=share_link
https://drive.google.com/file/d/1_D2NpI9SBGGlonCDFx6rctwX_gvM1l36/view?usp=share_link
The general instructions for happy path installations came from https://docs.google.com/document/d/1qm37EKkjgoPtjW4909UClzvsjQO5VSpPUvFO_hW_PEg/edit
Happened to at least one other person
and one more
https://redhat-internal.slack.com/archives/CBUT43E94/p1675196061551869
and one more
- clones
-
OCPBUGS-10728 GCP usage api response include other projects and can causes negative quota calculation
- Closed
- is blocked by
-
OCPBUGS-10728 GCP usage api response include other projects and can causes negative quota calculation
- Closed
- links to