Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31923

Cilium ENI mode does not work with OCP

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Undefined
    • None
    • 4.14.z, 4.15.z, 4.16.0
    • None
    • No
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

          This problem was found in ROSA HCP integrating Cilium CNI for native routing.
      
      Setting ENI mode (Elastic Network Interfaces) in the Cilium configuration will allow native routing working with ENIs in AWS. 
      In this configuration, the SAN of the CSRs created by the kubelet contains multiple IPs. The list of IPs change over time, which prevents the auto-approval of the CSRs and the availability of the nodes. The --node-ip parameter of the kubelet is not set, which means that the kubelet relies on its discovery mechanism, which is known to be fragile when multiple IPs are available, moreover added and removed dynamically."
      
      Additional info:
      Standalone OCP supports this feature from Cilium. (TBC)
      ROSA Classic does not support BYO CNI so this is not applicable either.

      Version-Release number of selected component (if applicable):

          tested from 4.14

      How reproducible:

          100%

      Steps to Reproduce:

          1. Configure a AWS VPC
          2. Deploy ROSA HCP cluster with no CNI mode
          3. Deploy Cilium with ENI mode
          
      
      rosa create cluster --cluster-name <your-cluster-name> --sts --role-arn arn:aws:iam::<aws-account-id>:role/ManagedOpenShift-HCP-ROSA-Installer-Role --support-role-arn arn:aws:iam::<aws-account-id>:role/ManagedOpenShift-HCP-ROSA-Support-Role --worker-iam-role arn:aws:iam::<aws-account-id>:role/ManagedOpenShift-HCP-ROSA-Worker-Role --external-id <your-name> --operator-roles-prefix <your-name> --oidc-config-id <oidc-config-id> --tags "owner:<your-name>" --region eu-central-1 --version <ocp-version,e.g.:4.14.13> --replicas 3 --compute-machine-type m5.xlarge --machine-cidr 10.0.0.0/16 --service-cidr 172.30.0.0/16 --pod-cidr 10.1.0.0/16 --host-prefix 23 --subnet-ids ${SUBNET_IDS} --disable-workload-monitoring --hosted-cp --billing-account <aws-account-id> --no-cni --watch      

      Actual results:

      Nodes not getting ready due to CSR not approved

      Expected results:

      Nodes joined and working

      Additional info:

          I've been talking with some TSE, SRE and ROSA devs, also in our HCP team. This is a not implemented feature (AFAIK) but not being 100% sure I've decided to create it as a bug and move it if applies.
      
      This request comes from Cilium, which is a feature that comes from Adobe in the end. The main feature that allow this changes is native routing instead of using overlay only plugins as OpenshiftSDN or OVN. (The contact from Isovalent for this is Frederic Giloux.

      ENI:

      Relevant slack threads:

      Attachments

        Activity

          People

            joelspeed Joel Speed
            jparrill@redhat.com Juan Manuel Parrilla Madrid
            Sunil Choudhary Sunil Choudhary
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: