Uploaded image for project: 'Container / Cluster Management (XCM) Strategy'
  1. Container / Cluster Management (XCM) Strategy
  2. XCMSTRAT-320

ROSA HCP: Additional Security Group(s) on Machine Pools

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Green
    • XCMSTRAT-277(P1) ROSA parity - HCP to offer all Classic features
    • 100
    • 100% 100%
    • Hide

      April 24

      • OCM:
        • Feature is released. Release pending for QE tasks{}

      April 25th:

      • QE:
        • Testing finished. Automation finished. CI integration finished. All cards closed. This feature can be closed

       

      Apr 2nd

      • OCM:
        • Terraform merged. Waiting on QE confirmation to release. 

       

      March 18th 

      • OCM:
        • Feature is available in production for all organizations, changes in the backend are completed to support CAPI.
        • Due Date is updated to 31st March to accommodate other clients such as TF. It will be move further based on UI and automation test completion. 
        • Feature will be officially be available in signed ROSA 1.2.37 targeted to be released on April 3rd. 
        •  

      March 7st:

      • [OCM] Development in final phase, bug fixes. Will enable the feature toggle in production for all organizations next week - once all tickets are closed and docs are in place. Next sync meeting will be go/no-go meeting for the feature.
      • [QE]Testing ongoing.
        • The supported lowest version needs PM's response. More details in the comment
        • OCM-6473 is blocking the testing about security groups on 4.14.x version node pool and upgrade scenarios testing
        • OCM-6510 is reported as critical issue. It seems an OCPBUG happens to 4.15.x version only. I cannot reproduce it with 4.14.x
        • Automation ongoing
          [UI] Development had re-started on this effort. UI is *not a requirement for Adobe MVP 3/15, but will be worked on and delivered soon after.
           

      March 1st:

      •  [OCM]Development in progress. Changes will be in staging in the week March 3rd. OCM SDK is merged to unblock CAPI. 
      • [QE]Testing plan ongoing

       

      Feb 21

      The feature will be available in OCM API and ROSA CLI earlier in order to unblock downstream CAPI Provider. The support on other clients like OCM-UI and TF Provider will follow (likely in Q2). 

      Show
      April 24 OCM: Feature is released. Release pending for QE tasks { } April 25th: QE: Testing finished. Automation finished. CI integration finished. All cards closed. This feature can be closed   Apr 2nd OCM: Terraform merged. Waiting on QE confirmation to release.    March 18th   OCM: Feature is available in production for all organizations, changes in the backend are completed to support CAPI. Due Date is updated to 31st March to accommodate other clients such as TF. It will be move further based on UI and automation test completion.  Feature will be officially be available in signed ROSA 1.2.37 targeted to be released on April 3rd.    March 7st: [OCM] Development in final phase, bug fixes. Will enable the feature toggle in production for all organizations next week - once all tickets are closed and docs are in place. Next sync meeting will be go/no-go meeting for the feature. [QE] Testing ongoing. The supported lowest version needs PM's response. More details in the comment OCM-6473 is blocking the testing about security groups on 4.14.x version node pool and upgrade scenarios testing OCM-6510 is reported as critical issue. It seems an OCPBUG happens to 4.15.x version only. I cannot reproduce it with 4.14.x Automation ongoing [UI] Development had re-started on this effort. UI is *not a requirement for Adobe MVP 3/15, but will be worked on and delivered soon after.   March 1st:   [OCM] Development in progress. Changes will be in staging in the week March 3rd. OCM SDK is merged to unblock CAPI.  [QE] Testing plan ongoing   Feb 21 The feature will be available in OCM API and ROSA CLI earlier in order to unblock downstream CAPI Provider. The support on other clients like OCM-UI and TF Provider will follow (likely in Q2). 
    • CY24Q1
    • 0

      Feature Overview (aka. Goal Summary)  

      Support additional AWS Security Group IDs while creating machine pool in an existing ROSA HCP cluster.

      Goals (aka. expected user outcomes)

      1. Allow customers to add up to 10 optional Security Group IDs for an OCM machine pool in Hosted Control Plane (HCP) topology

      Requirements (aka. Acceptance Criteria):

      See the above Goal entry.

      1. Support adding additional AWS Security Group IDs on while creating machine pool on existing ROSA HCP cluster. i.e., CREATE MACHINEPOOL
      2. Support viewing additional AWS Security Group IDs as part of machine pool i.e., Describe machine pool command
      3. Validate for soft limit of 5 SG-IDs. If quota permits allow up to 10 SG-IDs
      4. Validate the SG-IDs are present in the VPC.
      5. Retain the default worker SG with rules for communication within cluster components. 
      6. ROSA CLI, OCM UI and Terraform support
      7. Support for up to 10 security groups.
      8. Ability to attach to all nodes (existing and new) part of node pool (OCM machine pool in HCP topology)
      9. Support for day-1 - creation of cluster or day-one machine pool (called 'worker')
      10. Support for day-2 - creation of day-2 machine pools 
      11. Support for day-2 changes : add, remove SG IDs on all machine pools (both day-one and day-two)
      12. Change SG IDs attached to node pools w/o machine or node restart 
      13. OCM API that's similar in UX between Classic and HCP topology

      Use Cases (Optional):

      Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

       

      Questions to Answer (Optional):

      Include a list of refinement / architectural questions that may need to be answered before coding can begin.  Initial completion during Refinement status.

      1. In HCP clusters, the default-SG is shared between VPCE ENI and worker node ENI. This means ingress/egress rules are cumulative which is not ideal. When can we separate the SGs of ENIs? Yes, HyperShift project will separate the Security Groups for ENIs associated with VPC Endpoint and Worker Nodes, respectively. 
      2. In HCP clusters, the default-SG is used on all the workers of all the machine pools. This means there are ingress rules referencing several ports required by HCP components like Ingress that are applied on all worker nodes. And, there is no API or way to remove these rules if customer were to ask the ROSA HCP service. How can we address this? HyperShift project will limit the default Security Group attached to the Worker Nodes with only ingress/egress rules necessary for cluster to function. Additional ports such as SSH/22 or 30000-32767 will be removed from default worker SG, allowing customers to add additional SGs (an use case for this feature) when they need them. 
      3. Will it be possible to make changes to the Security Group(s) attached to the Machine Pool i.e., remove an attached SG or add a new SG to the machine pool? HyperShift project does not allow for changing the SGs at least without restarting the existing nodes as defined by MaxUnavailable. 

      The above requirements will be addressed by the linked HOSTEDCP EPIC so that the OCM API and then by extension clients of OCM API can use these capabilities.

      Out of Scope

      High-level list of items that are out of scope.  Initial completion during Refinement status.

      1. Adding additional Security Groups at the time of cluster creation
      2. Adding additional SG's to cluster's VPCE
      3. Updating list of additional SG-IDs on existing machine pool.
      4. Making changes/patches to default security group created by the cluster

      Background

      Provide any additional context is needed to frame the feature.  Initial completion during Refinement status.

       

      Customer Considerations

      Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.  Initial completion during Refinement status.

      1. Early customers will need this by 03/15 - especially the API to attach 10 addtl. SG-ID when creating machine pool. 

      Documentation Considerations

      Provide information that needs to be considered and planned so that documentation will meet customer needs.  Initial completion during Refinement status.

      1. Prerequisites section where Security Group is referenced must be update to call out the default ports that will be enabled for an HCP cluster. Please note this will be different from OCP or ROSA Classic.
      2. Managing Nodes through Machinepool section must be updated to call out the differences between adding Additional SG to ROSA Classic and ROSA HCP. 
      3. There are no control plane or infrastructure nodes in ROSA HCP but there will be a default SG for the VPC Endpoint that allows clients to access cluster's API Server from within the VPC. 

      Interoperability Considerations

      Which other projects and versions in our portfolio does this feature impact?  What interoperability test scenarios should be factored by the layered products?  Initial completion during Refinement status.

      1. Scale to Zero (SD-ADR-030) will impact this feature especially creating a node pool during HCP cluster creation.  

      References:

       

            rh-ee-bchandra Balachandran Chandrasekaran
            rh-ee-adejong Aaren de Jong
            Aaren de Jong, Balachandran Chandrasekaran
            Ori Adler Ori Adler
            Xue Li Xue Li
            Eric Ponvelle Eric Ponvelle
            Not Needed Not Needed (Inactive)
            Ori Adler Ori Adler
            Balachandran Chandrasekaran Balachandran Chandrasekaran
            Haoran Wang Haoran Wang
            Votes:
            1 Vote for this issue
            Watchers:
            22 Start watching this issue

              Created:
              Updated:
              Resolved: