-
Story
-
Resolution: Done
-
Normal
-
None
-
False
-
False
-
PSAP Sprint 210, PSAP Sprint 211
Slides for The talk The NFD operator is boring, and everyone loves it!
Author
Carlos Eduardo Arango [PSAP]
Abstract
>>> Intro
The Node feature Discovery(NFD for short) and its Operator have gone a long way since becoming an upstream project a year ago. It has gone from one of the key components for the GPU operator, to enabling all sorts of workflows that require an understanding of the underlying hardware to orchestrate pod allocation.
With the advent of Red Hat Open Data Science (RHODS), NFD has undergone a major transformation to become an operator that can be managed by the Red Hat Site Reliability Engineering (SRE) team, exposing alerts and metrics to make it a more stable and reliable component, as is expected of a managed service.
We can’t talk about “Node feature Discovery” in 2021 without mentioning the buzz word of the moment: ARM. NFD is becoming a key element for orchestrating multi-arch deployments, and that can be seen by the contribution of multiple teams inside Red Hat to the code base of NFD, looking to expand its capabilities in a Multi-arch world.
Currently NFD is one of the key building blocks to enable projects that are attempting to bring new types of workloads to kubernetes, among them, topology-aware scheduling and the upcoming secondary schedulers operator.
In this talk we will provide an update of the project “Node Feature Discovery”, and insights on the road map for future releases.
Currently, NFD works with a variety of plugins, such as the popular NVIDIA device plugin for enabling GPUs in OpenShift. However, it is possible to create your own device plugin to work with NFD as well. For example, IBM is currently developing an AI chip that they plan to use with OpenShift, but it is currently only emulated through QEMU since the chip does not yet exist. However, through the use of tools/packages like libvirt, we can emulate those in-development chips and have NFD interpret them, despite OpenShift never having worked with such devices before.
Audience level
Introductory
Format
Presentation(30 mins)
Initiative
Managed Services
Edge
Multi-Arch
Hybrid Cloud
Topic
System performance and tuning
Operators
Develop/Infrastructure