Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-2686

[DP] Extend Metal3 Firmware Updates (Disk, RAID controllers, CPLD, and TPM)

XMLWordPrintable

    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Feature Overview

      This feature extends the firmware update capabilities within the OpenShift Bare Metal deployment stack (leveraging Metal3) to encompass a complete set of server components. Currently, Metal3 supports updates for BMC, BIOS, and NIC firmware. This enhancement adds support for upgrading Disk, RAID controllers, CPLD, and TPM to provide a comprehensive, end-to-end Life-Cycle Management (LCM) solution for server hardware (Day-1 and Day-2).

      Goals

      The observable functionality that the user gains is the ability to perform firmware upgrades on all critical server components using the existing Metal3 framework.

      • Primary User/Persona: Cluster Administrator.
      • Extended Functionality: Expand existing Metal3 firmware update functionality to include:
        • Disk
        • RAID controllers
        • Complex Programmable Logic Device (CPLD)
        • Trusted Platform Module (TPM)
      • Business Requirement: Address the critical need for an end-to-end LCM solution for bare metal server hardware within OpenShift Container Platform (OCP).

      Requirements

      Functional Requirements

      1. The Metal3 framework must be able to detect, validate, and execute firmware updates for Disk, RAID controllers, CPLD, and TPM components on supported server hardware with Red Fish interface.
      2. Support levels will depend on the capabilities offered by each server vendor.
      3. The feature must support the following specific Hardware Bill of Materials (BoM) models with defined priority:
        • Priority 1 (Prio 1): Dell XR8620t, Dell XR8720t
        • Priority 2 (Prio 2): HPe DL110 Gen 11, HPe DL110 Gen12

      Non-Functional Requirements

      • Reliability/Performance (KPIs): Upgrade time must be measured. 
        • Key Performance Indicators (KPIs) for the upgrade time per hardware vendor must be defined and validated before the General Availability (GA) version of this feature.
      • Maintainability: The implementation must integrate seamlessly with the existing Metal3 update workflows (which currently support BMC, BIOS, and NIC) to minimize maintenance overhead for the operator.
      • Security: Ensure the firmware update process for the new components adheres to all OCP security standards, including image validation and secure communication paths.

      Use Case

      As a Cluster Administrator, I want to upgrade disk, RAID controllers, CPLD, and TPM firmware on my bare metal infrastructure during the maintenance window so that I can ensure the long-term stability and security of my hardware components using the built-in Life-Cycle Management solution (Day-1 and Day-2).

      Questions to Answer (Optional)

      These questions should be addressed by the Bare Metal Team architect during the design phase.

      • How will vendor-specific implementation details and potential limitations for updating Disk, RAID controllers, CPLD, and TPM be abstracted within the Metal3 framework?
      • What changes, if any, are required for the Bare Metal Host (BMH) Custom Resource Definition (CRD) to accommodate these new components and track their firmware status?
      • What are the specific failure and rollback mechanisms for updates to these critical components (Disk, RAID controllers, CPLD, and TPM)?
      • What are the defined KPIs (metrics and targets) for "performant firmware upgrades" for the Dell (Prio 1) and HPE (Prio 2) BoM models?

      These questions should be addressed by the Telco BU and Engineering team:

      • KPIs (upgrade time per component per hardware vendor, upgrade success rate per component per hardware vendor). 

      Out of Scope

      The following items are explicitly out of scope for the initial implementation:

      • Support for server hardware without Red Fish interface.
      • Support for components other than BMC, BIOS, NIC, Disk, RAID controllers, CPLD, and TPM.

      Links

      • Related JIRA RFE: RFE-8429 - Extend Metal3 F/W updates
      • Existing Functionality/Base Feature: OCPSTRAT-1794 (Reference for existing firmware update functionality/persona).

              mzasepa Michal Zasepa
              racedoro@redhat.com Ramon Acedo
              None
              Dmitry Tantsur, Iury Gregory Melo Ferreira, Jacob Anders, Zane Bitter
              Jacob Anders Jacob Anders
              Jacob Anders Jacob Anders
              None
              None
              Derrick Ornelas Derrick Ornelas
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: