Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-2250

Configure AWS User Tags on Day 2 (HCP only) - phase 2

XMLWordPrintable

    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None

      Feature Overview (aka. Goal Summary)  

      The goal is to enable in-place Day 2 tagging for cloud resources created by OpenShift, specifically EC2 instances and the default worker security group, within Hypershift/ROSA Hosted Control Plane (HCP) environments. Currently, changing tags day2 for these resources, particularly through updates to the HostedCluster or NodePool specifications, triggers a rolling upgrade of EC2 instances, which is not desired by customers. This was the finding in  OCPSTRAT-787. 

      The feature is phase-2 of OCPSTRAT-787 and aims to allow cluster administrators to add, update, and remove tags on existing resources without causing node recreation, addressing a critical gap in Hypershift/ROSA capabilities for managed clusters.

      Goals (aka. expected user outcomes)

      The observable functionality that the user now has as a result of receiving this feature. Complete during New status.
      As a result of this feature, cluster administrators should be able to: * Add, update, or remove one or more tags on existing EC2 instances without triggering a rolling upgrade or recreation of the nodes.

      • Add, update, or remove one or more tags on the default worker security group created by Hypershift.
      • Manage tags using ROSA client interfaces (ROSA CLI, API, Terraform provider, UI).
      • Ensure that tag changes made via the HostedCluster or NodePool spec are reconciled in place on the underlying AWS resources.
      • Receive alerts or clear status indicators when tag reconciliation fails, for example, due to conflicts or permission issues.
      • Maintain a single source of truth for tag information, ideally at the HostedCluster or NodePool level, to simplify management.

      Requirements (aka. Acceptance Criteria):

      A list of specific needs or objectives that a feature must deliver in order to be considered complete.  Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc.  Initial completion during Refinement status.

      • Following capabilities are available for AWS on standalone and HCP clusters.
      • OCP automatically tags the cloud resources with the Cluster's External ID. 
      • Tags added by default on Day 1 are not affected.
      • All existing active AWS resources in the OCP clusters have the tagging changes propagated.
      • All new AWS resources created by OCP reflect the changes to tagging.
      • Hive to support additional list of key=value strings on MachinePools
        • These are AWS user-defined / custom tags, not to be confused with node labels
        • ROSA CLI can accept a list of key=value strings with additional tag values
          • it currently can do this during cluster-install
        • The default tag(s) is/are still applied
        • NOTE: AWS limit of 50 tags per object (2 used automatically by OCP; with a third to be added soon; 10 reserved for Red Hat overall, as at least 2-3 are used by Managed Services) - customer's can only specify 40 tags max!
        • Must be able to modify tags after creation 
      • Support for OpenShift 4.21 onwards.

      Out-of-scope

      This feature will only apply to ROSA with Hosted Control Planes, and ROSA Classic / standalone is excluded.
      This feature specifically targets Hosted Control Plane deployments, as the underlying issue with rolling upgrades stems from Hypershift's reconciliation behavior in HCP. 
      Tagging of VPCs, as SD likely operates in a BYO-VPC model, making VPCs generally outside the direct scope of OpenShift's tag management.

      Background

      AWS Cluster API (CAPI) provider is expected to reconcile tags. In Hypershift, HostedCluster and NodePool specs propagate tags to CAPI, but any changes to these specs result in a rolling upgrade of EC2 instances, which customers want to avoid. This is why day2 tag changes for NodePools are not exposed in ROSA (OCM). The default worker security group also lacks tag updates due to a potential limitation in managed policies.
      The current implementation has operators reconcile tags by reading from the Infrastructure resource. However, in HCP, Hypershift itself reconciles the Infrastructure resource, setting tags from the HostedCluster spec. This creates a loop where OCM would need to update the HostedCluster spec, triggering the undesirable rolling upgrade. Upstream changes in CAPA are needed for a comprehensive solution. While there's an upstream design for in-place updates, its implementation is challenging due to CAPI's immutable design.

      Why is this important?

      • Customers want to use custom tagging for
        • access controls
        • chargeback/showback
        • cloud IAM conditional permissions

      Scenarios

      1. ...

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • ...

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      1. …

      Open questions::

      1. …

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

              rh-ee-smodeel Subin M
              julim Ju Lim
              None
              Aaren de Jong, Anirudh Agnihotri, Antoni Segura Puimedon, Balachandran Chandrasekaran, Eric Fried, Mike Worthington, Mohamed ElSerngawy, Mulham Raee, Nelson Jean, Nick Png, Subin M, Trilok Geer
              Scott Dodson Scott Dodson
              Yu Li Yu Li
              Matthew Werner Matthew Werner
              Kyle Walker Kyle Walker
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: