Uploaded image for project: 'Network Edge'
  1. Network Edge
  2. NE-1924

Add functionality to the openshift-router to remove stale status entries

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • Add functionality to the openshift-router to remove stale status entries
    • False
    • None
    • False
    • Not Selected
    • To Do
    • OCPPLAN-7878 - NetEdge - Maintainability and Debugability & Tech Backlog
    • OCPPLAN-7878NetEdge - Maintainability and Debugability & Tech Backlog
    • 0
    • 0

      Template:

      Networking Definition of Planned

      Epic Template descriptions and documentation

      Epic Goal

      The openshift-router should be updated to clear stale route status. Route status becomes stale when:

      The goal is to have all 3 scenarios handled in the openshift-router, and remove the route status clearing from the Ingress Operator. The router is already responsible for writing the route status, so it should be responsible for clearing it.

      As alluded to in this comment, goal should be to have the router do only 1 status update per route, as opposed to multiple updates which causes unneeded complexity and buggy behavior in the leasewriter and contention tracking logic.

      Additionally, this effort will most likely overlap with the UpgradeValidation logic introduced in this commit. An stretch goal is to also resolve the existing concerns mentioned above in the UpgradeValidation plugin, since this fix will be very closely related. See previous work for more details.

      Why is this important?

      Route status needs to be accurate. Customers as well as downstream products rely on the status to understand which IngressControllers a route is admitted to. 

      Security teams may rely on route status to ensure that an internal route is not admitted by a public IngressController.

      Currently, there is a workaround to manually clear route status, but it's inconvenient for customers:

      oc patch route <route-name> -n <namespace> --subresource=status --type=merge -p '{"status":null}'

      Planning Done Checklist

      The following items must be completed on the Epic prior to moving the Epic from Planning to the ToDo status

      • Priority+ is set by engineering
      • Epic must be Linked to a +Parent Feature
      • Target version+ must be set
      • Assignee+ must be set
      • (Enhancement Proposal is Implementable
      • (No outstanding questions about major work breakdown
      • (Are all Stakeholders known? Have they all been notified about this item?
      • Does this epic affect SD? {}Have they been notified{+}? (View plan definition for current suggested assignee)
        1. Please use the “Discussion Needed: Service Delivery Architecture Overview” checkbox to facilitate the conversation with SD Architects. The SD architecture team monitors this checkbox which should then spur the conversation between SD and epic stakeholders. Once the conversation has occurred, uncheck the “Discussion Needed: Service Delivery Architecture Overview” checkbox and record the outcome of the discussion in the epic description here.
        2. The guidance here is that unless it is very clear that your epic doesn’t have any managed services impact, default to use the Discussion Needed checkbox to facilitate that conversation.

      Additional information on each of the above items can be found here: Networking Definition of Planned

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement
        details and documents.

      ...

      Dependencies (internal and external)

      1. 

      ...

      Previous Work (Optional):

      1. https://github.com/openshift/cluster-ingress-operator/pull/849: Unsuccessful attempt to add logic to the Ingress Operator to clear route status if route or namespace labels change.
        • Closed over inefficiency concern: Ingress Operator now needs watch all namespaces and all routes in a new controller which the only purpose of which is to clear stale route status when route or ns labels change.
        • The router already receives reconciles routes that are unadmitted via route or ns label changes.
        • Not recommended to pursue this path.

             2. https://github.com/openshift/router/pull/514: Unsuccessful attempt to make the router clear route status if route or namespace labels change.

        • On the right track, but hacky way of clearing route status.
        • Needs to be reevaluated after new Upgrade Validation plugin: see OCPBUGS-26498 and https://github.com/openshift/router/pull/555 
          • UpgradeValidation introduced a new Condition type and mechanics for adding AND clearing this new Condition type. It is only enabled in 4.15 for SHA1 effort, but disabled in 4.16+.
          • Unfortunately, while it works in it's limited scope, it also brought undesired complexities and inefficiencies (see comment)
        • Because of this, you may need to touch the mechanics of UpgradeValidation, or it might be overall beneficial to resolve the issues in the UpgradeValidation plugin.

      Open questions::

      1. How should the openshift-router make a single status update per route reconcilation?
        • Right now, status updates happen in ExtendedValidator (reject), UpgradeValidation (UnserverableInFutureVersions), and the StatusAdmitter (admitted).
        • Previously, this was OK because there was only 1 condition (either it's rejected or admitted), and could be okay with adding status clearing here (either it's rejected, admitted, or not-admitted/cleared), but the UpgradeValidation plugin would be still in an undesirable place.
      2. Do we care about routers getting hard-kill signals and missing route status cleanups?
      3. Do we want to remove the existing logic from the Ingress Operator for clearing status once the router handles all stale status?
        • Added maintenance burden, but zero cost way to ensure route status is definitely cleaned up with IngressController is deleted.
      4. Do we need to backport this solution?

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

              Unassigned Unassigned
              gspence@redhat.com Grant Spence
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: