Uploaded image for project: 'Cloud Infrastructure Security & Compliance'
  1. Cloud Infrastructure Security & Compliance
  2. CMP-864

Improve long-term maintainability and debuggability for the operators

XMLWordPrintable

    • Maintainability and Debuggabilty
    • Improvement
    • False
    • False
    • Done
    • 0% To Do, 0% In Progress, 100% Done
    • Undefined
    • ISC

      Epic Goal

      tldr: three basic claims, the rest is explanation and one example

      1. We cannot improve long term maintainability solely by fixing bugs.
      2. Teams should be asked to produce designs for improving maintainability/debugability.
      3. Specific maintenance items (or investigation of maintenance items), should be placed into planning as peer to PM requests and explicitly prioritized against them.

      While bugs are an important metric, fixing bugs is different than investing in maintainability and debugability. Investing in fixing bugs will help alleviate immediate problems, but doesn't improve the ability to address future problems. You (may) get a code base with fewer bugs, but when you add a new feature, it will still be hard to debug problems and interactions. This pushes a code base towards stagnation where it gets harder and harder to add features.

      One alternative is to ask teams to produce ideas for how they would improve future maintainability and debugability instead of focusing on immediate bugs. This would produce designs that make problem determination, bug resolution, and future feature additions faster over time.

      I have a concrete example of one such outcome of focusing on bugs vs quality. We have resolved many bugs about communication failures with ingress by finding problems with point-to-point network communication. We have fixed the individual bugs, but have not improved the code for future debugging. In so doing, we chase many hard to diagnose problem across the stack. The alternative is to create a point-to-point network connectivity capability. this would immediately improve bug resolution and stability (detection) for kuryr, ovs, legacy sdn, network-edge, kube-apiserver, openshift-apiserver, authentication, and console. Bug fixing does not produce the same impact.

      We need more investment in our future selves. Saying, "teams should reserve this" doesn't seem to be universally effective. Perhaps an approach that directly asks for designs and impacts and then follows up by placing the items directly in planning and prioritizing against PM feature requests would give teams the confidence to invest in these areas and give broad exposure to systemic problems.

      Previous Work (Optional):

      1. https://issues.redhat.com/browse/CMP-701 
      2. https://issues.redhat.com/browse/CMP-817 

      Documentation needs:

      • Please see the individual issues linked to this Epic

      Quality Assurance Needs

      • Please see the individual issues linked to this Epic

              rhn-support-mrogers Matt Rogers (Inactive)
              rhn-engineering-nkinder Nathan Kinder
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: