-
Feature Request
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
False
-
None
-
False
-
Not Selected
-
-
1. Proposed title of this feature request
Multi-NIC CNI Support in OpenShift
2. What is the nature and description of the request?
Multi-NIC CNI (https://github.com/foundation-model-stack/multi-nic-cni) is a CNI operator streamlining secondary network configurations with a unified network definition. Multi-NIC CNI brings numerous merits to cluster admin and users. As for the merits for cluster admin, Multi-NIC CNI automates various configuration steps in the cluster such as preparing network attachment definition, defining IPAM range, network device health check, etc. To achieve that, cluster admin needs a deep knowledge to the underlying networks, but Multi CNI CNI frees cluster admin from those complex tasks. As for the merits for users, Multi-NIC CNI provides a unified network attachment definition to the cluster, and automatically attaches an appropriate interfaces based on the policies, so that users can get multi interface capabilities without knowing what network interfaces there are, and without changing their deployments even if they want to run on On-Prem, IBM Cloud, AWS, etc.
We have been using Multi-NIC CNI for two years in an AI super computer in IBM Cloud, named Vela, to scale AI training workloads to thousand-scale GPUs with RoCE/GDR capabilities. It is good timing to replicate this Multi-NIC capability to many users who want to build OpenShift cluster for AI.
There are a couple of blog posts to help to know the details of Multi-NIC CNI and its use cases.
3. Why does the customer need this? (List the business requirements here)
OpenShift AI and InstructLab are key platforms to run large scale AI training job or fine tuning job more easily, so it is quite important for customers to accelerate their OpenShift AI and InstructLab workloads more easily. Utilizing multiple high speed networks in the cluster is essential for this purpose. In addition, customers who want to utilize multi cloud environments including on-prem and clouds may be struggling for the network configurations. Multi-NIC CNI can help these users by eliminating complicated multiple network configuration tasks.
4. List any affected packages or components.
Multi-NIC CNI works in the middle between Multus CNI and other CNIs (such as ipvlan cni, macvlan cni, aws-vpc cni, host-device cni, etc.). The main role of Multi-NIC CNI is an automated configuration for each cni and other network configurations if necessary, so it does not change the packages or components but changes the operational flow for acquiring multiple interfaces for the pod.