Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-71567

How to configure IPoIB with nmstate on OpenShift 4.15.x nodes?

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Blocker Blocker
    • None
    • rhel-9.2.0
    • nmstate
    • None
    • No
    • None
    • rhel-sst-network-management
    • ssg_networking
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None

      What were you trying to do that didn't work?

      Integrate OpenShift 4.15 worker nodes, running on Dell HGX (XE9680) hardware, through Infiniband NVIDIA/Mellanox cards https://catalog.redhat.com/hardware/components/detail/243877 with  DataDirect Networks (DDN) EXAScaler appliances.

      DDN does have their CSI driver certified for OpenShift and bundle as an Operator  https://catalog.redhat.com/software/container-stacks/detail/64a84303aa534dcfb968dffa / https://github.com/redhat-openshift-ecosystem/certified-operators/blob/main/operators/exascaler-csi-driver-operator/2.2.6/metadata/annotations.yaml, and they claim that this also been tested within Infiniband cards on OpenShift. 

      According to DDN field engineers, to make the DDN CSI work with IB and DDN storage, OpenShift worker nodes needs 1) at least two IB interfaces, 2) one IP on each interface 3) DDN storage IP needs to be accessible simultaneously from both IB interfaces. 

      As Kubernetes NMstate Operator is the default way (declarative approach) to manage secondary interfaces on OpenShift, we are trying to use it to also managed IB interfaces.

      NMstate does support IB interfaces https://nmstate.io/devel/yaml_api.html#ip-over-infiniband-interface, but it seems that we are lacking on downstream documentation on RHEL and OpenShift for this use case. 

      What is the impact of this issue to you?

      Protect invest on existent hardware which does already work on other Linux and K8s distributions https://www.ddn.com/blog/ddn-expands-support-for-nvidia-technology-to-enable-ai-application-acceleration-for-data-center-infrastructure/ 
      Deliver AI/HPC environment for strategic end customers. 
      Expand RH ecosystem with specialized hardware for AI/HPC workloads. 

      Please provide the package NVR for which the bug is seen:

      nmstate-2.2.24-1.el9_2.x86_64

      How reproducible is this bug?

      Under investigation. 

      Steps to reproduce

      1. Have specialized hardware in place, like [Dell HGX (XE9680)|https://www.delltechnologies.com/asset/en-in/products/servers/technical-support/poweredge-xe9680-spec-sheet.pdf] and 

        DataDirect Networks (DDN) EXAScaler appliances

      1. Install NVIDIA Network Operator https://docs.nvidia.com/networking/display/kubernetes2410/nvidia+network+operator  to manage specialized Drivers for NVIDIA/Mellanox IB interfaces 

        https://catalog.redhat.com/hardware/components/detail/243877

      1. Install DDN CSI Drivers https://github.com/DDNStorage/exa-csi-driver/tree/master?tab=readme-ov-file#openshift 
      2. Manage to have IB Switches and connectivity established between Infiniband interfaces and target DDN storage 
      3. Set static IP and routing per OpenShift Node as listed bellow: 
      apiVersion: nmstate.io/v1
      kind: NodeNetworkConfigurationPolicy
      metadata:
        name: worker06
      spec:
        desiredState:
          interfaces:
          - description: ""
            infiniband:
              mode: datagram
              pkey: "0xffff"
            ipv4:
              address:
              - ip: 100.125.3.4
                prefix-length: 16
              dhcp: false
              enabled: true
            ipv6:
              enabled: false
            name: ibp27s0
            state: up
            type: infiniband
          - description: ""
            infiniband:
              mode: datagram
              pkey: "0xffff"
            ipv4:
              address:
              - ip: 100.125.2.4
                prefix-length: 16
              dhcp: false
              enabled: true
            ipv6:
              enabled: false
            name: ibp157s0
            state: up
            type: infiniband
            routes:
              config:
              - destination: 0.0.0.0/0
                metric: 100
                next-hop-address: 100.125.3.4
                next-hop-interface: ibp27s0
                table-id: 100
              - destination: 0.0.0.0/0
                metric: 101
                next-hop-address: 100.125.2.4
                next-hop-interface: ibp157s0
                table-id: 101
            route-rules:
              config:
                - ip-to: 100.125.0.0/16
                  ip-from: 100.125.3.4
                  priority: 100
                  route-table: 100
                - ip-to: 100.125.0.0/16
                  ip-from: 100.125.2.4
                  priority: 200
                  route-table: 101
        nodeSelector:
          kubernetes.io/hostname: worker06.analog.hgx.core42.hpc 

      Expected results

      Use RH's recommended approach to manage network interfaces with OpenShift 4.x. 

      Actual results

      Not clear documentation for this use case. 

      Also good to recap that the 1) this Infiniband cards are certified https://catalog.redhat.com/hardware/components/detail/243877, and 2) current documented scenario with Infiniband interfaces on OpenShift that I can see is this one with SR-IOV Operator, which is commonly used with specialized hardware, for AI and HPC use cases but no IPoIB is configured directly to the OCP nodes in this case https://docs.nvidia.com/networking/display/public/sol/rdg+for+accelerating+ai+workloads+in+red+hat+ocp+with+nvidia+dgx+a100+servers+and+nvidia+infiniband+fabric#src-99399137_RDGforAcceleratingAIWorkloadsinRedHatOCPwithNVIDIADGXA100ServersandNVIDIAInfiniBandFabric-LogicalDesign 

              nm-team Network Management Team
              rhn-support-arolivei Arthur Oliveira
              Network Management Team Network Management Team
              Mingyu Shi Mingyu Shi
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: