Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2387

[2047693] [IPI on Alibabacloud][China-site testing] one worker machine failed due to "error getting security groups ID: Unable to determine resource group ID for machine"

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • 4.10
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Version:
      ./openshift-install 4.10.0-0.nightly-2022-01-27-221656
      built from commit f2cbbed3749d281478fac59f20a6eea9c91b1e75
      release image registry.ci.openshift.org/ocp/release@sha256:3d31be3393d641e8e035edf786606471407dd3d279a8e066f3cd46ca3576dd38
      release architecture amd64
      Platform: alibabacloud

      Please specify:

      • IPI

      What happened?
      One worker machine failed to be launched, although the other worker and all masters are well.

      What did you expect to happen?
      All worker machines should be launched successfully.

      How to reproduce it (as minimally and precisely as possible)?
      Not quite sure how to trigger the issue, and we got the issue only once so far.

      Anything else we need to know?
      FYI the QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/71727/.

      $ oc get nodes
      NAME STATUS ROLES AGE VERSION
      jiwei-505-5bszr-master-0 Ready master 90m v1.23.0+d30ebbc
      jiwei-505-5bszr-master-1 Ready master 88m v1.23.0+d30ebbc
      jiwei-505-5bszr-master-2 Ready master 88m v1.23.0+d30ebbc
      jiwei-505-5bszr-worker-ap-northeast-1b-w6ndf Ready worker 75m v1.23.0+d30ebbc
      $ oc get machines -n openshift-machine-api -o wide
      NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
      jiwei-505-5bszr-master-0 Running ecs.g6.xlarge ap-northeast-1 ap-northeast-1b 91m jiwei-505-5bszr-master-0 alicloud://ap-northeast-1.i-6we5ndsrd5pgic9awzri Running
      jiwei-505-5bszr-master-1 Running ecs.g6.xlarge ap-northeast-1 ap-northeast-1a 91m jiwei-505-5bszr-master-1 alicloud://ap-northeast-1.i-6we17hhjtgdz21rct24s Running
      jiwei-505-5bszr-master-2 Running ecs.g6.xlarge ap-northeast-1 ap-northeast-1b 91m jiwei-505-5bszr-master-2 alicloud://ap-northeast-1.i-6we5ndsrd5pgic9awzrh Running
      jiwei-505-5bszr-worker-ap-northeast-1a-5cxxf Failed 85m Unknown
      jiwei-505-5bszr-worker-ap-northeast-1b-w6ndf Running ecs.g6.large ap-northeast-1 ap-northeast-1b 85m jiwei-505-5bszr-worker-ap-northeast-1b-w6ndf alicloud://ap-northeast-1.i-6we3gn9xd4oxogozkk1x Running
      $ oc describe machines jiwei-505-5bszr-worker-ap-northeast-1a-5cxxf -n openshift-machine-api | grep failed
      Error Message: failed to reconcile machine "jiwei-505-5bszr-worker-ap-northeast-1a-5cxxf": failed to create instance: error getting security groups ID: Unable to determine resource group ID for machine: "jiwei-505-5bszr-worker-ap-northeast-1a-5cxxf"
      Warning FailedCreate 84m alibabacloud-controller InvalidConfiguration: failed to reconcile machine "jiwei-505-5bszr-worker-ap-northeast-1a-5cxxf": failed to create instance: error getting security groups ID: Unable to determine resource group ID for machine: "jiwei-505-5bszr-worker-ap-northeast-1a-5cxxf"
      $
      $ oc get co | grep -Ev 'True False False'
      NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
      image-registry 4.10.0-0.nightly-2022-01-27-221656 True True True 76m Degraded: Registry deployment has timed out progressing: ReplicaSet "image-registry-64d996f79f" has timed out progressing.
      ingress 4.10.0-0.nightly-2022-01-27-221656 True False True 75m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-bfc7f578-2qjjb" cannot be scheduled: 0/4 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 3 node(s) had taint

      {node-role.kubernetes.io/master: }

      , that the pod didn't tolerate. Make sure you have sufficient worker nodes.), DeploymentReplicasAllAvailable=False (DeploymentReplicasNotAvailable: 1/2 of replicas are available)
      monitoring False True True 71m Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
      $ oc describe co image-registry | grep Degraded:
      Message: Degraded: Registry deployment has timed out progressing: ReplicaSet "image-registry-64d996f79f" has timed out progressing.
      $ oc -n openshift-image-registry get pods -o wide
      NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
      cluster-image-registry-operator-796fccb646-p5jv2 1/1 Running 0 96m 10.128.0.22 jiwei-505-5bszr-master-0 <none> <none>
      image-registry-64d996f79f-7bbxt 0/1 Pending 0 85m <none> <none> <none> <none>
      image-registry-64d996f79f-hdm85 1/1 Running 0 86m 10.131.0.9 jiwei-505-5bszr-worker-ap-northeast-1b-w6ndf <none> <none>
      node-ca-cmhh9 1/1 Running 0 86m 10.0.159.10 jiwei-505-5bszr-master-1 <none> <none>
      node-ca-hn5s2 1/1 Running 0 86m 10.0.98.55 jiwei-505-5bszr-master-0 <none> <none>
      node-ca-qkns6 1/1 Running 0 81m 10.0.98.57 jiwei-505-5bszr-worker-ap-northeast-1b-w6ndf <none> <none>
      node-ca-z9cgl 1/1 Running 0 86m 10.0.98.54 jiwei-505-5bszr-master-2 <none> <none>
      $

      $ aliyun --config-path ${ALI_CN_CONFIG} vpc DescribeVpcs --VpcName jiwei-505-5bszr-vpc --RegionId ap-northeast-1 --endpoint vpc.ap-northeast-1.aliyuncs.com --output cols=CreationTime,VpcId,CidrBlock rows=Vpcs.Vpc[]
      CreationTime | VpcId | CidrBlock
      ------------ | ----- | ---------
      2022-01-28T05:09:00Z | vpc-6wesmdl6y18kboddrga3o | 10.0.0.0/16

      $ aliyun --config-path ${ALI_CN_CONFIG} vpc DescribeVSwitches --RegionId ap-northeast-1 --endpoint vpc.ap-northeast-1.aliyuncs.com --VpcId vpc-6wesmdl6y18kboddrga3o --output cols=ZoneId,VSwitchName,VSwitchId rows=VSwitches.VSwitch[]
      ZoneId | VSwitchName | VSwitchId
      ------ | ----------- | ---------
      ap-northeast-1b | jiwei-505-5bszr-vswitch-ap-northeast-1b | vsw-6we89wumalxja22jwh9wr
      ap-northeast-1a | jiwei-505-5bszr-vswitch-nat-gateway | vsw-6weka2ijza6c9kd06sg59
      ap-northeast-1a | jiwei-505-5bszr-vswitch-ap-northeast-1a | vsw-6wehbfok0p6fya2nkfuju

      $ aliyun --config-path ${ALI_CN_CONFIG} ecs DescribeSecurityGroups --RegionId ap-northeast-1 --endpoint ecs.ap-northeast-1.aliyuncs.com --VpcId vpc-6wesmdl6y18kboddrga3o --output cols=SecurityGroupName,SecurityGroupId,ResourceGroupId rows=SecurityGroups.SecurityGroup[]
      SecurityGroupName | SecurityGroupId | ResourceGroupId
      ----------------- | --------------- | ---------------
      ngw-6we6dbyqf65mgtuk7ut5l_security_group | sg-6wefafbq011w5axhm0jf |
      jiwei-505-5bszr-sg-master | sg-6wec42vu65t6jajlmqjs | rg-aek2wcax72uz6sy
      jiwei-505-5bszr-sg-worker | sg-6wec42vu65t6jajlmqjt | rg-aek2wcax72uz6sy

      $ aliyun --config-path ${ALI_CN_CONFIG} ecs DescribeInstances --RegionId ap-northeast-1 --endpoint ecs.ap-northeast-1.aliyuncs.com --VpcId vpc-6wesmdl6y18kboddrga3o --output cols=CreationTime,ZoneId,InstanceName,Status,VpcAttributes.VSwitchId,SecurityGroupIds.SecurityGroupId[] rows=Instances.Instance[]
      CreationTime | ZoneId | InstanceName | Status | VpcAttributes.VSwitchId | SecurityGroupIds.SecurityGroupId[]
      ------------ | ------ | ------------ | ------ | ----------------------- | ----------------------------------
      2022-01-28T05:23Z | ap-northeast-1b | jiwei-505-5bszr-worker-ap-northeast-1b-w6ndf | Running | vsw-6we89wumalxja22jwh9wr | [sg-6wec42vu65t6jajlmqjt]
      2022-01-28T05:09Z | ap-northeast-1b | jiwei-505-5bszr-master-0 | Running | vsw-6we89wumalxja22jwh9wr | [sg-6wec42vu65t6jajlmqjs]
      2022-01-28T05:09Z | ap-northeast-1b | jiwei-505-5bszr-master-2 | Running | vsw-6we89wumalxja22jwh9wr | [sg-6wec42vu65t6jajlmqjs]
      2022-01-28T05:09Z | ap-northeast-1a | jiwei-505-5bszr-master-1 | Running | vsw-6wehbfok0p6fya2nkfuju | [sg-6wec42vu65t6jajlmqjs]

      $

              Unassigned Unassigned
              beth.white Beth White
              None
              None
              Gaoyun Pei Gaoyun Pei
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: