Details

    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • 0
    • 0% 0%
    • 0

    Description

      Feature Overview (aka. Goal Summary)  

      ROSA will ship AMI’s with the AWS SSM Agent installed so that SRE can ensure Red Hat can meet the SLO’s for HyperShift worker nodes and respond to incidents impacting the Kubernetes API Server for ROSA.

      Goals (aka. expected user outcomes)

      • Red Hat SRE is able to access EC2 instances for all flavors of ROSA clusters at any point in the node's lifecycle without a dependency on the Kubernetes API Server or other working nodes.
      • Customers are able to access EC2 instances for all flavors of ROSA clusters without a dependency on the Kubernetes API Server or other working nodes so that they can debug node issues as they might today with oc debug node.
      • Ensure direct SRE access to EC2 instances does not require the customer’s network to allow public SSH connections over port 22.
      • Ensure that no long-lived SSH keys/credentials/secrets are needed in the process
      • Ensure that there are guardrails in place so that customers and Red Hat can audit any direct SRE access to EC2 instances associated with ROSA clusters.
      • Ensure the solution is acceptable for FedRAMP High environments - as much as possible we don’t want to have two different solutions to this problem for FedRAMP and commercial ROSA

      Requirements (aka. Acceptance Criteria):

      • The AWS Systems Manager Agent (SSM Agent) is installed and running on the underlying EC2 host for all ROSA cluster nodes.

      Out of Scope

      • Support for non-AWS platforms (ie. GCP, ARO, etc.).

      Background

      Red Hat would like to support worker nodes in ROSA to the extent of making sure they join the cluster and become nodes, for both non-HyperShift and HyperShift while enabling increasing levels of customization for Machines and Nodes. In order to do this, Red Hat SRE needs a method of accessing underlying EC2 instances without having to rely on a functional Kubernetes API Server nor kubelet.

      SRE-P has an existing jump host process, however there are multiple deficiencies with the process:

      • It only works for non-STS clusters. We do still support non-STS clusters (31% of production fleet in August 2022), however STS clusters are the recommended type that we encourage customers to install, or in the case of FedRAMP or HyperShift, the only type of clusters that can be installed. For STS clusters, SRE-P lacks the necessary AWS IAM permissions to follow the SOP.
      • It is also not compatible with the minimum requirements of the PrivateLink architecture, where we do not require the cluster’s VPC to allow public network ingress, which some customers prefer.

      Customer Considerations

      • Customers do not want to enable SSH to their EC2 instances over the public internet and would prefer that they could block port 22. 

      Documentation Considerations

      Interoperability Considerations

      This would only affect OSD (AWS), ROSA Classsic, ROSA HCP, and ROSA FedRAMP.

      Attachments

        Activity

          People

            Unassigned Unassigned
            drow.openshift.srep Dustin Row
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: