Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-27045

Investigate failing HCP creation in INT CANARY

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • Fleet Manager
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • OSDFM Sprint 2, OSDFM Sprint 3
    • None

       INT CANARY HC RUNS #310, #312, [#313,[#314|https://ci.int.devshift.net/view/osd-fleet-manager/job/osdfm-hc-creation-integration-canary/314/] failed with HC going to ERROR  state from VALIDATING.

      DT CS LOG:

       

      2025-12-01T21:26:40.769872156Z ERROR provision_step_runner.go:50 [opid='2e8b8524-40d6-45bf-9f22-349ec17bcb14'] [cid='2mtrifu5t7dem4bt853te22ct43rmj5s'] Error executing pending step: SetupManagementClusterStep completion: failed to select a management cluster: Cluster installation has failed due to unavailable capacity in region 'us-east-1'. While we are bringing on additional capacity, try your cluster installation in another region or try again later.
      ....
      2025-12-01T21:26:40.766428664Z ERROR step_setup_management_cluster.go:60 [opid='2e8b8524-40d6-45bf-9f22-349ec17bcb14'] [cid='2mtrifu5t7dem4bt853te22ct43rmj5s'] No management cluster set for shard '94f6867a-5fdc-11ef-94fe-0a580a83181f' during provisioning 
      
      
      

      TEST ENV DETAILS:

       

      SC: maintenance, MC: ready

      SC: 

      cr2v9p6hgn8cjsck0060 ServiceCluster hs-sc-a37oeoteg maintenance 2024-08-21 10:08:36 EST 2d9qjgrfcvpmo1se334rferiql6uq7np 94f6867a-5fdc-11ef-94fe-0a580a83181f canary us-east-1 4.19.19 true 467d 7h

      MC:

      d2ork6mrk70s73fs30ug hs-mc-c73e7ak5g ready 2025-08-29T14:32:58Z 94d us-east-1 canary 2kvmq7jctjsetgq2vpdgvnctut0n1gvf cr2v9p6hgn8cjsck0060

       

      NOTE : Tried explicitly passing PS ID and region for HC creation, still failed with same error.

       

      Initial Root Cause Analysis:

      • There is issue with MC (int:canary hs-mc-c73e7ak5g - d2ork6mrk70s73fs30ug)[receiving timeout for oc commands]
        E1202 08:12:03.146174 17422 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://api.hs-mc-c73e7ak5g.nvog.i1.devshift.org:6443/api?timeout=32s\": dial tcp 44.214.28.161:6443: i/o timeout"
      • Created a new MC for int:canary (d4ncjp595tqc73dupto0), while we investigate and fix or cleanup the old MC.
      • New MC is ready.Old MC moved to maintenance .
         
         

              cdoan@redhat.com Christopher Doan
              rh-ee-anfranci Anna Francis
              Anna Francis Anna Francis
              Fleet Management
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: