-
Task
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
False
-
Not Selected
-
-
-
OSDFM Sprint 2, OSDFM Sprint 3
-
None
INT CANARY HC RUNS #310, #312, [#313,[#314|https://ci.int.devshift.net/view/osd-fleet-manager/job/osdfm-hc-creation-integration-canary/314/] failed with HC going to ERROR state from VALIDATING.
DT CS LOG:
2025-12-01T21:26:40.769872156Z ERROR provision_step_runner.go:50 [opid='2e8b8524-40d6-45bf-9f22-349ec17bcb14'] [cid='2mtrifu5t7dem4bt853te22ct43rmj5s'] Error executing pending step: SetupManagementClusterStep completion: failed to select a management cluster: Cluster installation has failed due to unavailable capacity in region 'us-east-1'. While we are bringing on additional capacity, try your cluster installation in another region or try again later. .... 2025-12-01T21:26:40.766428664Z ERROR step_setup_management_cluster.go:60 [opid='2e8b8524-40d6-45bf-9f22-349ec17bcb14'] [cid='2mtrifu5t7dem4bt853te22ct43rmj5s'] No management cluster set for shard '94f6867a-5fdc-11ef-94fe-0a580a83181f' during provisioning
TEST ENV DETAILS:
SC: maintenance, MC: ready
SC:
cr2v9p6hgn8cjsck0060 ServiceCluster hs-sc-a37oeoteg maintenance 2024-08-21 10:08:36 EST 2d9qjgrfcvpmo1se334rferiql6uq7np 94f6867a-5fdc-11ef-94fe-0a580a83181f canary us-east-1 4.19.19 true 467d 7h
MC:
d2ork6mrk70s73fs30ug hs-mc-c73e7ak5g ready 2025-08-29T14:32:58Z 94d us-east-1 canary 2kvmq7jctjsetgq2vpdgvnctut0n1gvf cr2v9p6hgn8cjsck0060
NOTE : Tried explicitly passing PS ID and region for HC creation, still failed with same error.
Initial Root Cause Analysis:
- There is issue with MC (int:canary hs-mc-c73e7ak5g - d2ork6mrk70s73fs30ug)[receiving timeout for oc commands]
E1202 08:12:03.146174 17422 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://api.hs-mc-c73e7ak5g.nvog.i1.devshift.org:6443/api?timeout=32s\": dial tcp 44.214.28.161:6443: i/o timeout"
- Created a new MC for int:canary (d4ncjp595tqc73dupto0), while we investigate and fix or cleanup the old MC.
- New MC is ready.Old MC moved to maintenance .