Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-993

NTO: No race to update MC when nodes with different # of CPUs are in the same MCP

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Major Major
    • July Release for PSAP
    • None
    • NTO

      As an openshift admin I want to be protected automatically from potential continuous reboots when machines with different CPU count are in the same MCP and TuneD calculates different kernel parameters for them.  See: 

      Acceptance criteria:

      No continuous reboots in the cluster with the configuration above.  Alert Prometheus rule for this case or at least a warning in the operator logs.

            jmencak Jiri Mencak
            jmencak Jiri Mencak
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: