Details
-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.14.z, 4.15.z, 4.16.0
-
None
-
No
-
False
-
Description
Description of problem:
This problem was found in ROSA HCP integrating Cilium CNI for native routing. Setting ENI mode (Elastic Network Interfaces) in the Cilium configuration will allow native routing working with ENIs in AWS. In this configuration, the SAN of the CSRs created by the kubelet contains multiple IPs. The list of IPs change over time, which prevents the auto-approval of the CSRs and the availability of the nodes. The --node-ip parameter of the kubelet is not set, which means that the kubelet relies on its discovery mechanism, which is known to be fragile when multiple IPs are available, moreover added and removed dynamically." Additional info: Standalone OCP supports this feature from Cilium. (TBC) ROSA Classic does not support BYO CNI so this is not applicable either.
Version-Release number of selected component (if applicable):
tested from 4.14
How reproducible:
100%
Steps to Reproduce:
1. Configure a AWS VPC 2. Deploy ROSA HCP cluster with no CNI mode 3. Deploy Cilium with ENI mode rosa create cluster --cluster-name <your-cluster-name> --sts --role-arn arn:aws:iam::<aws-account-id>:role/ManagedOpenShift-HCP-ROSA-Installer-Role --support-role-arn arn:aws:iam::<aws-account-id>:role/ManagedOpenShift-HCP-ROSA-Support-Role --worker-iam-role arn:aws:iam::<aws-account-id>:role/ManagedOpenShift-HCP-ROSA-Worker-Role --external-id <your-name> --operator-roles-prefix <your-name> --oidc-config-id <oidc-config-id> --tags "owner:<your-name>" --region eu-central-1 --version <ocp-version,e.g.:4.14.13> --replicas 3 --compute-machine-type m5.xlarge --machine-cidr 10.0.0.0/16 --service-cidr 172.30.0.0/16 --pod-cidr 10.1.0.0/16 --host-prefix 23 --subnet-ids ${SUBNET_IDS} --disable-workload-monitoring --hosted-cp --billing-account <aws-account-id> --no-cni --watch
Actual results:
Nodes not getting ready due to CSR not approved
Expected results:
Nodes joined and working
Additional info:
I've been talking with some TSE, SRE and ROSA devs, also in our HCP team. This is a not implemented feature (AFAIK) but not being 100% sure I've decided to create it as a bug and move it if applies. This request comes from Cilium, which is a feature that comes from Adobe in the end. The main feature that allow this changes is native routing instead of using overlay only plugins as OpenshiftSDN or OVN. (The contact from Isovalent for this is Frederic Giloux.
ENI:
- https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html
- https://aws.github.io/aws-eks-best-practices/networking/vpc-cni/
- ENI + Cilium: https://docs.cilium.io/en/v1.12/concepts/networking/ipam/eni/#ipam-eni
Relevant slack threads: