-
Bug
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
None
-
None
-
None
-
None
-
None
-
None
Description:
The AWS PrivateLink cluster cannot be installed successfully, because starting from version 4.16, the OCP installation via CAPI, changes the security groups from the original groups [infraId-master-sg, infraId-worker-sg] to [infraId-node, infraId-lb, infraId-apiserver-lb, infraId-controlplane].
security groups via terraform:
- mihuang711a-p2x9x-master-sg
- mihuang711a-p2x9x-worker-sg
- Security group for Kubernetes ELB xxxxxxxx (openshift-ingress/router-default)
security groups via CAPI:
- mihuang711c-dxgrm-node
- mihuang711c-dxgrm-lb
- mihuang711c-dxgrm-controlplane
- mihuang711c-dxgrm-apiserver-lb
- Security group for Kubernetes ELB xxxxxxxx (openshift-ingress/router-default)
In the Hive code, when using hiveutil or manually to generate the AWS PrivateLink needed resources, it configures the security groups by retrieving the worker security groups (Values: []*string{aws.String(infraID + "-worker-sg")}). This method is suitable for versions 4.15 and befor. For versions 4.16+ via CAPI, it cannot retrieve any SG.
Version:
4.16+
How reproducible:
Always
Steps to Reproduce:
1. Using `hiveutil` or manually to generate the AWS PrivateLink needed resources, you cannot configure the security group. This causes the AWS PrivateLink cluster installation to fail.
Actually results:
1.Cannot configure the security group.
./bin/hiveutil awsprivatelink endpointvpc add $endpointVPC2 --region us-east-2 --subnet-ids $endpointVPC2Subnets -d
…
FATA[0016] Failed to get worker SG of the associated VPC error="default SG not found for VPC 0xc000b46010"
2.Configuring resources in the original way, then installing the cluster, results in the following error.
time="2024-07-11T06:05:59Z" level=debug msg="E0711 06:05:59.556306 96 controller.go:329] \"Reconciler error\" err=\"expected at least 1 public subnet but got 0\" controller=\"awscluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/mihuanghive416-89h2v\" namespace=\"openshift-cluster-api-guests\" name=\"mihuanghive416-89h2v\" reconcileID=\"2c0ccbb6-2f19-481d-b5a2-ec362ff31e0e\""
Expected results:
1.Successfully configure the security group. (This result is from a 4.15 cluster as an example.)
$ ./bin/hiveutil awsprivatelink endpointvpc add $endpointVPC --region us-east-2 --subnet-ids $endpointVPCSubnets -d … DEBU[0011] Found worker SG sg-00df18cfb1aa56983 of the associated Hive cluster INFO[0011] Authorizing traffic from the associated VPC's worker SG to the endpoint VPC's default SG INFO[0012] Authorizing traffic from the endpoint VPC's default SG to the associated VPC's worker SG INFO[0012] Adding endpoint VPC vpc-0133317c4cfe4168c to HiveConfig DEBU[0013] Endpoint VPC added to HiveConfig
2. AWS Private Link cluster successfully installed for version 4.16 and later.
Note: We didn't encounter such issue when install the private clusters on version 4.15 and before.
Collect logs from the hive-controller pod and the 4.16 provisioning pod.