Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: Fleet Manager
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
ACM-24897
Color Status:
Not Selected
Intelligence Requested:
Market:

Sprint:
OSDFM Sprint 2, OSDFM Sprint 3

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

INT CANARY HC RUNS #310, #312, [#313,[#314|https://ci.int.devshift.net/view/osd-fleet-manager/job/osdfm-hc-creation-integration-canary/314/] failed with HC going to ERROR state from VALIDATING.

DT CS LOG:

2025-12-01T21:26:40.769872156Z ERROR provision_step_runner.go:50 [opid='2e8b8524-40d6-45bf-9f22-349ec17bcb14'] [cid='2mtrifu5t7dem4bt853te22ct43rmj5s'] Error executing pending step: SetupManagementClusterStep completion: failed to select a management cluster: Cluster installation has failed due to unavailable capacity in region 'us-east-1'. While we are bringing on additional capacity, try your cluster installation in another region or try again later.
....
2025-12-01T21:26:40.766428664Z ERROR step_setup_management_cluster.go:60 [opid='2e8b8524-40d6-45bf-9f22-349ec17bcb14'] [cid='2mtrifu5t7dem4bt853te22ct43rmj5s'] No management cluster set for shard '94f6867a-5fdc-11ef-94fe-0a580a83181f' during provisioning

TEST ENV DETAILS:

SC: maintenance, MC: ready

SC:

cr2v9p6hgn8cjsck0060 ServiceCluster hs-sc-a37oeoteg maintenance 2024-08-21 10:08:36 EST 2d9qjgrfcvpmo1se334rferiql6uq7np 94f6867a-5fdc-11ef-94fe-0a580a83181f canary us-east-1 4.19.19 true 467d 7h

MC:

d2ork6mrk70s73fs30ug hs-mc-c73e7ak5g ready 2025-08-29T14:32:58Z 94d us-east-1 canary 2kvmq7jctjsetgq2vpdgvnctut0n1gvf cr2v9p6hgn8cjsck0060

NOTE : Tried explicitly passing PS ID and region for HC creation, still failed with same error.

Initial Root Cause Analysis:

There is issue with MC (int:canary hs-mc-c73e7ak5g - d2ork6mrk70s73fs30ug)[receiving timeout for oc commands]

E1202 08:12:03.146174 17422 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://api.hs-mc-c73e7ak5g.nvog.i1.devshift.org:6443/api?timeout=32s\": dial tcp 44.214.28.161:6443: i/o timeout"

Created a new MC for int:canary (d4ncjp595tqc73dupto0), while we investigate and fix or cleanup the old MC.

New MC is ready.Old MC moved to maintenance .

Assignee:: Christopher Doan

Reporter:: Anna Francis

QA Contact:: Anna Francis

Team:: Fleet Management

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/12/02 2:15 PM

Updated:: 2025/12/12 1:03 PM

Resolved:: 2025/12/12 1:03 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates