-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.10
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Version:
./openshift-install 4.10.0-0.nightly-2022-01-27-221656
built from commit f2cbbed3749d281478fac59f20a6eea9c91b1e75
release image registry.ci.openshift.org/ocp/release@sha256:3d31be3393d641e8e035edf786606471407dd3d279a8e066f3cd46ca3576dd38
release architecture amd64
Platform: alibabacloud
Please specify:
- IPI
What happened?
One worker machine failed to be launched, although the other worker and all masters are well.
What did you expect to happen?
All worker machines should be launched successfully.
How to reproduce it (as minimally and precisely as possible)?
Not quite sure how to trigger the issue, and we got the issue only once so far.
Anything else we need to know?
FYI the QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/71727/.
$ oc get nodes
NAME STATUS ROLES AGE VERSION
jiwei-505-5bszr-master-0 Ready master 90m v1.23.0+d30ebbc
jiwei-505-5bszr-master-1 Ready master 88m v1.23.0+d30ebbc
jiwei-505-5bszr-master-2 Ready master 88m v1.23.0+d30ebbc
jiwei-505-5bszr-worker-ap-northeast-1b-w6ndf Ready worker 75m v1.23.0+d30ebbc
$ oc get machines -n openshift-machine-api -o wide
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
jiwei-505-5bszr-master-0 Running ecs.g6.xlarge ap-northeast-1 ap-northeast-1b 91m jiwei-505-5bszr-master-0 alicloud://ap-northeast-1.i-6we5ndsrd5pgic9awzri Running
jiwei-505-5bszr-master-1 Running ecs.g6.xlarge ap-northeast-1 ap-northeast-1a 91m jiwei-505-5bszr-master-1 alicloud://ap-northeast-1.i-6we17hhjtgdz21rct24s Running
jiwei-505-5bszr-master-2 Running ecs.g6.xlarge ap-northeast-1 ap-northeast-1b 91m jiwei-505-5bszr-master-2 alicloud://ap-northeast-1.i-6we5ndsrd5pgic9awzrh Running
jiwei-505-5bszr-worker-ap-northeast-1a-5cxxf Failed 85m Unknown
jiwei-505-5bszr-worker-ap-northeast-1b-w6ndf Running ecs.g6.large ap-northeast-1 ap-northeast-1b 85m jiwei-505-5bszr-worker-ap-northeast-1b-w6ndf alicloud://ap-northeast-1.i-6we3gn9xd4oxogozkk1x Running
$ oc describe machines jiwei-505-5bszr-worker-ap-northeast-1a-5cxxf -n openshift-machine-api | grep failed
Error Message: failed to reconcile machine "jiwei-505-5bszr-worker-ap-northeast-1a-5cxxf": failed to create instance: error getting security groups ID: Unable to determine resource group ID for machine: "jiwei-505-5bszr-worker-ap-northeast-1a-5cxxf"
Warning FailedCreate 84m alibabacloud-controller InvalidConfiguration: failed to reconcile machine "jiwei-505-5bszr-worker-ap-northeast-1a-5cxxf": failed to create instance: error getting security groups ID: Unable to determine resource group ID for machine: "jiwei-505-5bszr-worker-ap-northeast-1a-5cxxf"
$
$ oc get co | grep -Ev 'True False False'
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
image-registry 4.10.0-0.nightly-2022-01-27-221656 True True True 76m Degraded: Registry deployment has timed out progressing: ReplicaSet "image-registry-64d996f79f" has timed out progressing.
ingress 4.10.0-0.nightly-2022-01-27-221656 True False True 75m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-bfc7f578-2qjjb" cannot be scheduled: 0/4 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 3 node(s) had taint
, that the pod didn't tolerate. Make sure you have sufficient worker nodes.), DeploymentReplicasAllAvailable=False (DeploymentReplicasNotAvailable: 1/2 of replicas are available)
monitoring False True True 71m Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
$ oc describe co image-registry | grep Degraded:
Message: Degraded: Registry deployment has timed out progressing: ReplicaSet "image-registry-64d996f79f" has timed out progressing.
$ oc -n openshift-image-registry get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cluster-image-registry-operator-796fccb646-p5jv2 1/1 Running 0 96m 10.128.0.22 jiwei-505-5bszr-master-0 <none> <none>
image-registry-64d996f79f-7bbxt 0/1 Pending 0 85m <none> <none> <none> <none>
image-registry-64d996f79f-hdm85 1/1 Running 0 86m 10.131.0.9 jiwei-505-5bszr-worker-ap-northeast-1b-w6ndf <none> <none>
node-ca-cmhh9 1/1 Running 0 86m 10.0.159.10 jiwei-505-5bszr-master-1 <none> <none>
node-ca-hn5s2 1/1 Running 0 86m 10.0.98.55 jiwei-505-5bszr-master-0 <none> <none>
node-ca-qkns6 1/1 Running 0 81m 10.0.98.57 jiwei-505-5bszr-worker-ap-northeast-1b-w6ndf <none> <none>
node-ca-z9cgl 1/1 Running 0 86m 10.0.98.54 jiwei-505-5bszr-master-2 <none> <none>
$
$ aliyun --config-path ${ALI_CN_CONFIG} vpc DescribeVpcs --VpcName jiwei-505-5bszr-vpc --RegionId ap-northeast-1 --endpoint vpc.ap-northeast-1.aliyuncs.com --output cols=CreationTime,VpcId,CidrBlock rows=Vpcs.Vpc[]
CreationTime | VpcId | CidrBlock
------------ | ----- | ---------
2022-01-28T05:09:00Z | vpc-6wesmdl6y18kboddrga3o | 10.0.0.0/16
$ aliyun --config-path ${ALI_CN_CONFIG} vpc DescribeVSwitches --RegionId ap-northeast-1 --endpoint vpc.ap-northeast-1.aliyuncs.com --VpcId vpc-6wesmdl6y18kboddrga3o --output cols=ZoneId,VSwitchName,VSwitchId rows=VSwitches.VSwitch[]
ZoneId | VSwitchName | VSwitchId
------ | ----------- | ---------
ap-northeast-1b | jiwei-505-5bszr-vswitch-ap-northeast-1b | vsw-6we89wumalxja22jwh9wr
ap-northeast-1a | jiwei-505-5bszr-vswitch-nat-gateway | vsw-6weka2ijza6c9kd06sg59
ap-northeast-1a | jiwei-505-5bszr-vswitch-ap-northeast-1a | vsw-6wehbfok0p6fya2nkfuju
$ aliyun --config-path ${ALI_CN_CONFIG} ecs DescribeSecurityGroups --RegionId ap-northeast-1 --endpoint ecs.ap-northeast-1.aliyuncs.com --VpcId vpc-6wesmdl6y18kboddrga3o --output cols=SecurityGroupName,SecurityGroupId,ResourceGroupId rows=SecurityGroups.SecurityGroup[]
SecurityGroupName | SecurityGroupId | ResourceGroupId
----------------- | --------------- | ---------------
ngw-6we6dbyqf65mgtuk7ut5l_security_group | sg-6wefafbq011w5axhm0jf |
jiwei-505-5bszr-sg-master | sg-6wec42vu65t6jajlmqjs | rg-aek2wcax72uz6sy
jiwei-505-5bszr-sg-worker | sg-6wec42vu65t6jajlmqjt | rg-aek2wcax72uz6sy
$ aliyun --config-path ${ALI_CN_CONFIG} ecs DescribeInstances --RegionId ap-northeast-1 --endpoint ecs.ap-northeast-1.aliyuncs.com --VpcId vpc-6wesmdl6y18kboddrga3o --output cols=CreationTime,ZoneId,InstanceName,Status,VpcAttributes.VSwitchId,SecurityGroupIds.SecurityGroupId[] rows=Instances.Instance[]
CreationTime | ZoneId | InstanceName | Status | VpcAttributes.VSwitchId | SecurityGroupIds.SecurityGroupId[]
------------ | ------ | ------------ | ------ | ----------------------- | ----------------------------------
2022-01-28T05:23Z | ap-northeast-1b | jiwei-505-5bszr-worker-ap-northeast-1b-w6ndf | Running | vsw-6we89wumalxja22jwh9wr | [sg-6wec42vu65t6jajlmqjt]
2022-01-28T05:09Z | ap-northeast-1b | jiwei-505-5bszr-master-0 | Running | vsw-6we89wumalxja22jwh9wr | [sg-6wec42vu65t6jajlmqjs]
2022-01-28T05:09Z | ap-northeast-1b | jiwei-505-5bszr-master-2 | Running | vsw-6we89wumalxja22jwh9wr | [sg-6wec42vu65t6jajlmqjs]
2022-01-28T05:09Z | ap-northeast-1a | jiwei-505-5bszr-master-1 | Running | vsw-6wehbfok0p6fya2nkfuju | [sg-6wec42vu65t6jajlmqjs]
$