Uploaded image for project: 'RHOS Request for Features'
  1. RHOS Request for Features
  2. RHOSRFE-181

Enhanced reboot strategies for EDPM nodes

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • 0% To Do, 0% In Progress, 100% Done

      Feature Overview (mandatory - Complete while in New status)
      The proposed feature enhances the edpm_reboot_strategy to give users more control over when their OpenStack data plane nodes reboot after an update. Currently, the options are rigid: force a reboot, reboot only for greenfield deployments, or never reboot. This new "enabled" strategy allows for intelligent reboots based on the needs-restarting -r command, which only reboots the node if a reboot is actually required by the system. This is crucial for environments like banks and businesses that have dedicated maintenance windows but don't require 24/7 uptime and want to ensure updates are fully applied without unnecessary reboots. This feature provides a more efficient and less disruptive update process.

      Goals (mandatory - Complete while in New status)

      This feature aims to provide OpenStack users with a more intelligent and flexible way to manage node reboots after applying updates.

      Who benefits: Businesses and organizations, such as financial institutions, that operate with scheduled maintenance windows and can tolerate downtime to apply system updates. System administrators and SREs will benefit from a more predictable and controlled update process.

      Today vs. With this Feature: Today, a user must either force a reboot on every update (force), or manually handle reboots, which can lead to missed updates if not done properly (never). The new enabled strategy automates the check for required reboots, ensuring updates are fully applied while avoiding unnecessary downtime, all within a single, simple configuration setting.

      Requirements (mandatory -_ Complete while in Refinement status):

      Requirement Notes isMVP?
      Implement a new edpm_reboot_strategy value: enabled The edpm_reboot_strategy will accept a new string value enabled. Yes
      The enabled strategy must check the output of needs-restarting -r If the output indicates a reboot is required, the node should be rebooted. Yes
      The enabled strategy must not reboot if needs-restarting -r indicates no reboot is necessary This prevents unnecessary downtime. Yes
      The enabled strategy must ignore the existence of Nova Unlike the auto strategy, the enabled strategy must not be influenced by whether Nova is deployed. Yes
      The disabled or unset strategy will become the new default behavior This will clarify that a user must explicitly opt into any form of automated reboot. Yes
      The auto strategy will be retained for backward compatibility The existing auto functionality (rebooting on greenfield deployments but not with Nova) must remain unchanged. No

       

      Done - Acceptance Criteria (mandatory - Complete while in Refinement status):

      The feature is considered complete when the following criteria are met:

      • The edpm_reboot_strategy parameter now accepts the value enabled.
      • A user can set edpm_reboot_strategy: enabled in their openstackdataplane deployment configuration.
      • When a openstackdataplane update runs with edpm_reboot_strategy: enabled, the system executes the needs-restarting -r command on the target nodes.
      • If the needs-restarting -r command returns a value indicating a reboot is required, the node is automatically rebooted.
      • If the needs-restarting -r command returns a value indicating no reboot is required, the node is not rebooted.
      • The enabled strategy functions correctly regardless of whether Nova is deployed on the cluster.
      • The auto strategy maintains its existing behavior.
      • A new default behavior is established for edpm_reboot_strategy, where a disabled or unset value explicitly prevents reboots.

      Use Cases - i.e. User Experience & Workflow: (Initial completion while in Refinement status):

      Main Success Scenario: Applying Updates During a Maintenance Window

      • The user schedules a maintenance window for their OpenStack data plane nodes.
      • The user sets edpm_reboot_strategy: enabled in their deployment manifest.
      • During the maintenance window, the user applies an update to the openstackdataplane deployment.
      • The update process runs, and the needs-restarting -r check determines a reboot is necessary to apply a new kernel or system library update.
      • The system automatically reboots the nodes.
      • The nodes come back online with the updates fully applied and no manual intervention was needed to initiate the reboot.

      Alternative Flow: No Reboot Required

      • The user schedules a maintenance window and sets edpm_reboot_strategy: enabled.
      • The user applies an update that consists of minor configuration changes and does not require a system reboot.
      • The needs-restarting -r check returns a value indicating no reboot is required.
      • The system completes the update without rebooting the nodes, preventing unnecessary downtime.

      Out of Scope __(Initial completion while in Refinement status):
      High-level list of items or persona’s that are out of scope.
      <your text here>

      Documentation Considerations __(Initial completion while in Refinement status):

      • Update the Red Hat OpenStack on OpenShift documentation for the edpm_reboot_strategy parameter to include the new enabled value and its behavior.
      • Clearly explain the difference between enabled and auto to prevent user confusion, especially regarding how Nova's presence affects each strategy.
      • Provide a clear table outlining all four strategies (force, enabled, disabled, auto) and their specific behaviors under different conditions (e.g., reboot required, Nova deployed).
      • Add a warning that the enabled strategy does not perform workload evacuation and should only be used when it is safe to do so.

       

      Questions to Answer __(Initial completion while in Refinement status):
      Include a list of refinement / architectural questions that may need to be answered before coding can begin.
      <your text here>

      Background and Strategic Fit (Initial completion while in Refinement status):
      Provide any additional context is needed to frame the feature.
      <your text here>

      Customer Considerations __(Initial completion while in Refinement status):
      Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.
      <your text here>

      Team Sign Off (Completion while in Planning status)

      • All required Epics (known at the time) are linked to the this Feature
      • All required Stories, Tasks (known at the time) for the most immediate Epics have been created and estimated
      • Add - Reviewers name, Team Name
      • Acceptance == Feature as “Ready” - well understood and scope is clear - Acceptance Criteria (scope) is elaborated, well defined, and understood
      • Note: Only set FixVersion/s: on a Feature if the delivery team agrees they have the capacity and have committed that capability for that milestone
      Reviewed By Team Name Accepted Notes
             
             
             
             

       

              Unassigned Unassigned
              pnavarro@redhat.com Pedro Navarro Perez
              rhos-dfg-upgrades
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: