Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-525

Build PerfConf Presentation for NFD

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Normal Normal
    • July Release for PSAP
    • None
    • NFD
    • False
    • False
    • PSAP Sprint 210, PSAP Sprint 211

      Slides for The talk The NFD operator is boring, and everyone loves it!
      Author
      Carlos Eduardo Arango [PSAP]

      Abstract
      >>> Intro
      The Node feature Discovery(NFD for short) and its Operator have gone a long way since becoming an upstream project a year ago. It has gone from one of the key components for the GPU operator, to enabling all sorts of workflows that require an understanding of the underlying hardware to orchestrate pod allocation.
      With the advent of Red Hat Open Data Science (RHODS), NFD has undergone a major transformation to become an operator that can be managed by the Red Hat Site Reliability Engineering (SRE) team, exposing alerts and metrics to make it a more stable and reliable component, as is expected of a managed service.
      We can’t talk about “Node feature Discovery” in 2021 without mentioning the buzz word of the moment: ARM. NFD is becoming a key element for orchestrating multi-arch deployments, and that can be seen by the contribution of multiple teams inside Red Hat to the code base of NFD, looking to expand its capabilities in a Multi-arch world.
      Currently NFD is one of the key building blocks to enable projects that are attempting to bring new types of workloads to kubernetes, among them, topology-aware scheduling and the upcoming secondary schedulers operator.
      In this talk we will provide an update of the project “Node Feature Discovery”, and insights on the road map for future releases.
      Currently, NFD works with a variety of plugins, such as the popular NVIDIA device plugin for enabling GPUs in OpenShift. However, it is possible to create your own device plugin to work with NFD as well. For example, IBM is currently developing an AI chip that they plan to use with OpenShift, but it is currently only emulated through QEMU since the chip does not yet exist. However, through the use of tools/packages like libvirt, we can emulate those in-development chips and have NFD interpret them, despite OpenShift never having worked with such devices before.
      Audience level
      Introductory
      Format
      Presentation(30 mins)
      Initiative
      Managed Services
      Edge
      Multi-Arch
      Hybrid Cloud
      Topic
      System performance and tuning
      Operators
      Develop/Infrastructure

              zkosic Zvonko Kaiser (Inactive)
              carangog Eduardo Arango (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: