Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2248

[alibabacloud] IPI installation failed with master nodes being NotReady and CCM error "alicloud: unable to split instanceid and region from providerID"

    • None
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      IPI installation failed with master nodes being NotReady and CCM error "alicloud: unable to split instanceid and region from providerID".

      Version-Release number of selected component (if applicable):

      4.12.0-0.nightly-2022-10-05-053337

      How reproducible:

      Always

      Steps to Reproduce:

      1. try IPI installation on alibabacloud, with credentialsMode being "Manual"
      2.
      3.
      

      Actual results:

      Installation failed.

      Expected results:

      Installation should succeed.

      Additional info:

      $ oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version             False       True          34m     Unable to apply 4.12.0-0.nightly-2022-10-05-053337: an unknown error has occurred: MultipleErrors
      $ 
      $ oc get nodes
      NAME                           STATUS     ROLES                  AGE   VERSION
      jiwei-1012-02-9jkj4-master-0   NotReady   control-plane,master   30m   v1.25.0+3ef6ef3
      jiwei-1012-02-9jkj4-master-1   NotReady   control-plane,master   30m   v1.25.0+3ef6ef3
      jiwei-1012-02-9jkj4-master-2   NotReady   control-plane,master   30m   v1.25.0+3ef6ef3
      $ 
      
      CCM logs:
      E1012 03:46:45.223137       1 node_controller.go:147] node-controller "msg"="fail to find ecs" "error"="cloud instance api fail, alicloud: unable to split instanceid and region from providerID, error unexpected providerID="  "providerId"="alicloud://"
      E1012 03:46:45.223174       1 controller.go:317] controller/node-controller "msg"="Reconciler error" "error"="find ecs: cloud instance api fail, alicloud: unable to split instanceid and region from providerID, error unexpected providerID=" "name"="jiwei-1012-02-9jkj4-master-0" "namespace"="" 
      
      https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/145768/ (Finished: FAILURE)
      10-12 10:55:15.987  ./openshift-install 4.12.0-0.nightly-2022-10-05-053337
      10-12 10:55:15.987  built from commit 84aa8222b622dee71185a45f1e0ba038232b114a
      10-12 10:55:15.987  release image registry.ci.openshift.org/ocp/release@sha256:41fe173061b00caebb16e2fd11bac19980d569cd933fdb4fab8351cdda14d58e
      10-12 10:55:15.987  release architecture amd64
      
      FYI the installation could succeed with 4.12.0-0.nightly-2022-09-28-204419:
      https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/145756/ (Finished: SUCCESS)
      10-12 09:59:19.914  ./openshift-install 4.12.0-0.nightly-2022-09-28-204419
      10-12 09:59:19.914  built from commit 9eb0224926982cdd6cae53b872326292133e532d
      10-12 09:59:19.914  release image registry.ci.openshift.org/ocp/release@sha256:2c8e617830f84ac1ee1bfcc3581010dec4ae5d9cad7a54271574e8d91ef5ecbc
      10-12 09:59:19.914  release architecture amd64
      

       

       

            [OCPBUGS-2248] [alibabacloud] IPI installation failed with master nodes being NotReady and CCM error "alicloud: unable to split instanceid and region from providerID"

            Per the announcement sent regarding the removal of "Blocker" as an option in the Priority field, this issue (which was already closed at the time of the bulk update) had Priority = "Blocker." It is being updated to Priority = Critical. No additional fields were changed.

            OpenShift Jira Automation Bot added a comment - Per the announcement sent regarding the removal of "Blocker" as an option in the Priority field, this issue (which was already closed at the time of the bulk update) had Priority = "Blocker." It is being updated to Priority = Critical. No additional fields were changed.

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2023:1326

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:1326

            Jianli Wei added a comment -

            Jianli Wei added a comment - Tested with a build having the PR https://github.com/openshift/machine-config-operator/pull/3449 (see https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws-modern/1602111849178861568 ), IPI installation on Alibabacloud (CCO in manual mode) can succeed now. FYI the QE flexy-install job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/163718/ (SUCCESS)  

            Cheng Zhang added a comment -

            Thanks for your update. ffranz@redhat.com 

             

            Cheng Zhang added a comment - Thanks for your update. ffranz@redhat.com    

            I changed this bug to cover 4.13, we'll track 4.12 in https://issues.redhat.com/browse/OCPBUGS-3311

            FYI chezhang@redhat.com 

             

            Fabiano Franz added a comment - I changed this bug to cover 4.13, we'll track 4.12 in https://issues.redhat.com/browse/OCPBUGS-3311 FYI chezhang@redhat.com    

            Jianli Wei added a comment -

            twiest.hive Would you please help? It's said the root cause is similar to what you've fixed in https://github.com/openshift/machine-config-operator/pull/3338. Thanks!

             

            Jianli Wei added a comment - twiest.hive Would you please help? It's said the root cause is similar to what you've fixed in https://github.com/openshift/machine-config-operator/pull/3338. Thanks!  

            Cheng Zhang added a comment -

            Cloned bug https://issues.redhat.com/browse/OCPBUGS-3311 to track this issue on 4.12 release, update the target version of https://issues.redhat.com/browse/OCPBUGS-2248 to 4.13.0
            Thanks. rhn-support-xtian 

             

            Cheng Zhang added a comment - Cloned bug https://issues.redhat.com/browse/OCPBUGS-3311 to track this issue on 4.12 release, update the target version of https://issues.redhat.com/browse/OCPBUGS-2248 to 4.13.0 Thanks. rhn-support-xtian    

            Cheng Zhang added a comment - - edited

            To track down this TestBlocker in 4.12, I updated the target version to 4.12.0, please feel free to correct me. Thanks.
            cc: beth.white rhn-support-mdineen julim rhn-support-mfiedler sunweiadai rhn-support-jiwei gpei@redhat.com 

             

            Cheng Zhang added a comment - - edited To track down this TestBlocker in 4.12, I updated the target version to 4.12.0, please feel free to correct me. Thanks. cc: beth.white rhn-support-mdineen julim rhn-support-mfiedler sunweiadai rhn-support-jiwei gpei@redhat.com    

            Jing Gu (Inactive) added a comment - - edited

            The providerID which is set by kubelet of this node is wrong.

            `node.spec.providerID` is set by the kubelet config file with --provider-id parameter. It should be in format of "${regionID}:${instanceID}" .

            related PRs:

            https://github.com/openshift/machine-config-operator/pull/2777/files

            https://github.com/openshift/machine-config-operator/pull/2814/files

            https://github.com/openshift/machine-config-operator/pull/3338/files

             

             

             

            Jing Gu (Inactive) added a comment - - edited The providerID which is set by kubelet of this node is wrong. `node.spec.providerID` is set by the kubelet config file with --provider-id parameter. It should be in format of "${regionID}:${instanceID}" . related PRs: https://github.com/openshift/machine-config-operator/pull/2777/files https://github.com/openshift/machine-config-operator/pull/2814/files https://github.com/openshift/machine-config-operator/pull/3338/files      

            Cheng Zhang added a comment - - edited

            Yes, I agree with mifiedle@redhat.com 's opinion. This bug should target 4.12.0 and be included in the filter of 4.12 test blockers.

             

            Cheng Zhang added a comment - - edited Yes, I agree with mifiedle@redhat.com 's opinion. This bug should target 4.12.0 and be included in the filter of 4.12 test blockers.  

              gausingh@redhat.com Gaurav Singh
              rhn-support-jiwei Jianli Wei
              Jianli Wei Jianli Wei
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Created:
                Updated:
                Resolved: