Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32978

[release-4.15]Extra reboot with performance profile on 4.14 when mcp worker resumes with upgrade

XMLWordPrintable

    • -
    • No
    • CNF Compute Sprint 253
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      *Cause*: The bug presents itself in scenarios of updating a performance-tuned cluster with a MachineConfigPool in a paused state. For example, an EUS-to-EUS update of a performance-tuned cluster will trigger the bug.
      *Consequence*: The bug is causing an additional reboot of all the nodes belonging to the paused MachineConfigPool, after the pool is "un-paused". It's because PerformanceProfile controller was reconciling against rendered MachineConfig appearing in the MachineConfigPool status. While this MachineConfig reflects current pool conditions, it does not reflect the latest planned state when the pool is paused.
      "Un-pausing" the Pool leads to one reboot when applying the target MachineConfig, and one additional reboot after the performance profile reconciles against it.
      *Fix*: PerformanceProfile controller will reconcile against the staged MachineConfig for the Pool
      *Result*: No extra-reboot observed.
      Show
      *Cause*: The bug presents itself in scenarios of updating a performance-tuned cluster with a MachineConfigPool in a paused state. For example, an EUS-to-EUS update of a performance-tuned cluster will trigger the bug. *Consequence*: The bug is causing an additional reboot of all the nodes belonging to the paused MachineConfigPool, after the pool is "un-paused". It's because PerformanceProfile controller was reconciling against rendered MachineConfig appearing in the MachineConfigPool status. While this MachineConfig reflects current pool conditions, it does not reflect the latest planned state when the pool is paused. "Un-pausing" the Pool leads to one reboot when applying the target MachineConfig, and one additional reboot after the performance profile reconciles against it. *Fix*: PerformanceProfile controller will reconcile against the staged MachineConfig for the Pool *Result*: No extra-reboot observed.
    • Bug Fix
    • In Progress

      When a PerformanceProfile is applied to a minor version upgrade and the worker mcp paused and then resumed at target version. The worker nodes go thru two reboots rendering multiple worker mc configs. With a default upgrade ( no PerformanceProfle) only the expected one reboot is observed.  

      Version-Release number of selected component (if applicable): 

        

      How reproducible 

          100%

      Steps to Reproduce:

          1.Create PerfProfile at pre upgrade 4.14 release
          2.pause worker mcp
          3.Upgrade to target version
          4. Resume MCP
          

      Actual results:

          workers need 2 reboots 

      Expected results:

          One reboot 

      Additional info:

          apiVersion: performance.openshift.io/v2
      kind: PerformanceProfile
      metadata:
        name: perf-profile-2m-worker
      spec:
        cpu:
          reserved: 0-3
          isolated: 4-63
        workloadHints:
          realTime: false
        hugepages:
          defaultHugepagesSize: "2M"
          pages:
          - size: "2M"
            count: 24000
            node: 0
          - size: "2M"
            count: 24000
            node: 1
        realTimeKernel:
          enabled: false
        numa:
          topologyPolicy: "best-effort"
        net:
          userLevelNetworking: false
        nodeSelector:
          node-role.kubernetes.io/worker: ""

            vgrinber@redhat.com Vitaly Grinberg
            wilsondav Dave Wilson
            Mallapadi Niranjan Mallapadi Niranjan
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: