-
Feature
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
Feature Overview
This feature introduces a Proof of Concept (PoC) to enable the OpenShift Container Platform (OCP) Bare Metal Operator, via Ironic, to automatically configure Top of Rack (TOR) switches during bare metal node provisioning. By automating network port configuration, this functionality aims to drastically reduce manual setup time, eliminate common sources of network misconfiguration, and streamline the deployment of bare metal clusters. This capability is designed to integrate with centralized management solutions like Red Hat Advanced Cluster Management for Kubernetes (ACM), extending the existing Redfish-based server management to include network infrastructure.
Goals
The primary goal of this PoC is to validate the technical feasibility of Ironic-driven TOR switch configuration and inform the design of a fully-featured solution.
Upon completion, an administrator will be able to:
- Automatically configure basic network settings on a TOR switch port when a corresponding baremetal node is deployed.
- Define the desired switch port configuration declaratively as part of the bare metal deployment process.
This feature expands the existing capabilities of the Bare Metal Operator. The primary user persona is the Bare Metal Operator or Network Administrator.
Requirements
A list of specific needs or objectives that this feature must deliver to be considered complete.
Functional Requirements:
- The solution must support basic configuration of switch access ports, including setting the default VLAN.
- The solution must support basic configuration of switch trunk ports, including setting a default VLAN and a list of allowed VLANs.
- The system must include error handling. If a switch configuration fails, the process should log a clear, actionable error message. The overall provisioning process behavior (e.g., halt, retry) in case of failure must be defined.
- The configuration process must be idempotent, meaning repeated applications of the same configuration will not result in errors or changes after the initial success.
Stretch Functional Requirements:
- Demonstrate the ability to create and manage port channels (link aggregations) on a single TOR switch.
- Demonstrate the ability to create and manage port channels across a split (multi-chassis) TOR switch pair.
Non-Functional Requirements:
- Documentation: The outcome of the PoC must be documented in a report that details the findings, architectural recommendations, and identified challenges. This report will serve as the primary input for a follow-up implementation feature.
- Security: The Ironic service must be granted user credentials to access the switches. The design must assume these credentials are scoped with the principle of least privilege.
- Usability: It is assumed that some related configurations (e.g., switch management IP, base user accounts) may have been manually pre-configured on the switches.
User Scenarios
- As a Bare Metal Operator, I want the Bare Metal Operator to automatically configure Top of Rack (TOR) switch ports when provisioning a new server, so that I can reduce manual provisioning time and eliminate network misconfigurations across my fleet of clusters.
Main Success Scenario:
- The operator defines a BareMetalHost manifest that includes a new section for the desired TOR switch port configuration (e.g., switch IP, port ID, VLAN settings).
- The operator applies the manifest to the cluster.
- The Bare Metal Operator instructs Ironic to begin provisioning the node.
- As part of the provisioning workflow, Ironic authenticates to the specified TOR switch.
- Ironic applies the declared configuration (e.g., setting VLANs) to the correct switch port.
- The switch port is successfully configured, and the bare metal node provisioning process continues to completion.
Alternative Flow (Configuration Fails):
- During the provisioning workflow, Ironic attempts to apply the configuration to the TOR switch but fails (e.g., invalid credentials, network timeout, incompatible command).
- Ironic logs a detailed error message specifying the host, switch, and reason for failure.
- The BareMetalHost object status is updated to reflect the provisioning failure, making the issue visible to the operator.
Questions to Answer
This PoC should help answer the following architectural and design questions:
- Vendor Priority: What is the priority list of hardware vendors and network operating systems that this feature should support, based on feedback from the Telco Product Manager and other key stakeholders?
- Communication Protocol: What is the most suitable protocol for switch communication (e.g., NETCONF, RESTCONF, gNMI, vendor-specific SSH/CLI scraping)? The choice will impact scalability and maintainability.
- Security Model: What is the recommended, secure method for storing and managing switch credentials within the OpenShift cluster (e.g., Kubernetes Secrets with specific Role-Based Access Control (RBAC))?
- Performance Impact: What is the acceptable performance overhead for the TOR configuration step during the bare metal node provisioning workflow?
Out of Scope
The following items are explicitly out of scope for this Proof of Concept:
- A feature-complete implementation intended for General Availability (GA).
- Advanced networking configurations (e.g., BGP, OSPF, LLDP, Quality of Service (QoS)).
- Configuration of the network fabric beyond the directly connected TOR switch ports for a given host.
- Universal support for all switch hardware vendors and network operating systems.
Links
- Upstream Ironic Blueprint:[ Launchpad Bug 2113769|https://bugs.launchpad.net/ironic/+bug/2113769]
- Target Documentation Area:[ OpenShift Container Platform - Installing on bare metal|https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html-single/installing_on_bare_metal/index]
- Metal3 Requirements/Asks