Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-2427

[rosa] 6th .metal new instances cannot provision successfully

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • Proposed
    • None
    • None
    • None
    • None

      Description of problem:

      m6id.metal
      c6id.metal
      r6id.metal
      r5b.metal 
      rosa cluster with above new instances type cannot provision successfully

      Version-Release number of selected component (if applicable):

      not related with version

      How reproducible:

      always

      Steps to Reproduce:

      1.rosa create cluster -c tzhou --compute-machine-type c6id.metal
      2.
      3.
      

      Actual results:

      The cluster is in error status

      Expected results:

      The cluster is ready

      Additional info:

      here are some logs and debug info:
      $ ocm get cluster 266n4fjprsjs423bb1vufkt2cj8l56b9 | jq -r .status
      {
        "state": "error",
        "description": "GeneralOperatorDegraded",
        "dns_ready": true,
        "oidc_ready": false,
        "provision_error_message": "",
        "provision_error_code": "",
        "configuration_mode": "full",
        "limited_support_reason_count": 0
      }
      $ ocm get /api/clusters_mgmt/v1/clusters/266n4fjprsjs423bb1vufkt2cj8l56b9/resources/live
      ...
          "conditions": [
            {
              "type": "ProvisionFailed",
              "status": "True",
              "lastProbeTime": "2023-09-12T02:35:58Z",
              "lastTransitionTime": "2023-09-12T02:35:58Z",
              "reason": "GeneralOperatorDegraded",
              "message": "Timeout waiting for an operator to become ready"
            },
      ...
      $ ocm get /api/clusters_mgmt/v1/clusters/266njkqad4ipndraac0ei4u07nuso18s/logs/install
      ...
      time="2023-09-12T03:06:12Z" level=error msg="Cluster operator authentication Available is False with WellKnown_NotReady: WellKnownAvailable: The well-known endpoint is not yet available: kube-apiserver oauth endpoint https://10.0.156.181:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance)"
      ...
      // the cluster can login, and the co looks good
      $ oc get co -A --kubeconfig=kubeconfig
      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.12.31   True        False         False      48m     
      baremetal                                  4.12.31   True        False         False      101m    
      cloud-controller-manager                   4.12.31   True        False         False      105m    
      cloud-credential                           4.12.31   True        False         False      105m    
      cluster-autoscaler                         4.12.31   True        False         False      102m    
      config-operator                            4.12.31   True        False         False      103m    
      console                                    4.12.31   True        False         False      55m     
      control-plane-machine-set                  4.12.31   True        False         False      99m     
      csi-snapshot-controller                    4.12.31   True        False         False      102m    
      dns                                        4.12.31   True        False         False      102m    
      etcd                                       4.12.31   True        False         False      100m    
      image-registry                             4.12.31   True        False         False      60m     
      ingress                                    4.12.31   True        False         False      59m     
      insights                                   4.12.31   True        False         False      89m     
      kube-apiserver                             4.12.31   True        False         False      87m     
      kube-controller-manager                    4.12.31   True        False         False      100m    
      kube-scheduler                             4.12.31   True        False         False      100m    
      kube-storage-version-migrator              4.12.31   True        False         False      102m    
      machine-api                                4.12.31   True        False         False      60m     
      machine-approver                           4.12.31   True        False         False      102m    
      machine-config                             4.12.31   True        False         False      100m    
      marketplace                                4.12.31   True        False         False      102m    
      monitoring                                 4.12.31   True        False         False      58m     
      network                                    4.12.31   True        False         False      104m    
      node-tuning                                4.12.31   True        False         False      102m    
      openshift-apiserver                        4.12.31   True        False         False      87m     
      openshift-controller-manager               4.12.31   True        False         False      99m     
      openshift-samples                          4.12.31   True        False         False      94m     
      operator-lifecycle-manager                 4.12.31   True        False         False      102m    
      operator-lifecycle-manager-catalog         4.12.31   True        False         False      102m    
      operator-lifecycle-manager-packageserver   4.12.31   True        False         False      94m     
      service-ca                                 4.12.31   True        False         False      102m    
      storage                                    4.12.31   True        False         False      102m    
      
      

      cc. yunjiang-1  would you please help take a look if the bug is under correct component? thanks in advance.

              Unassigned Unassigned
              tzhou5 Tongtong Zhou
              None
              None
              None
              Tongtong Zhou Tongtong Zhou
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: