Uploaded image for project: 'OpenShift Top Level Product Strategy'
  1. OpenShift Top Level Product Strategy
  2. OCPPLAN-9619

Document Federated IDP between ACM, RH SSO, and OCP clusters

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • openshift-4.16
    • None
    • None
    • False
    • None
    • False
    • Not Selected
    • ?
    • No
    • ?
    • ?
    • ?
    • 0
    • 0% 0%

      OCP/Telco Definition of Done
      Feature Template descriptions and documentation.

      Feature Overview

      • This Section:* High-Level description of the feature ie: Executive Summary
      • Note:* A Feature is a capability or a well defined set of functionality that delivers business value. Features can include additions or changes to existing functionality. Features can easily span multiple teams, and multiple releases.

      Goals

      • This Section:* Provide high-level goal statement, providing user context and expected user outcome(s) for this feature

      Requirements

      • This Section:* A list of specific needs or objectives that a Feature must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the feature shifts. If a non MVP requirement slips, it does not shift the feature.
      Requirement Notes isMvp?
      CI - MUST be running successfully with test automation This is a requirement for ALL features. YES
      Release Technical Enablement Provide necessary release enablement details and documents. YES

      (Optional) Use Cases

      This Section:

      • Main success scenarios - high-level user stories
      • Alternate flow/scenarios - high-level user stories
      • ...

      Questions to answer…

      • ...

      Out of Scope

      Background, and strategic fit

      Background:

      -----------------
      It's always good when we try something that our customers would try and then learn from it.  I often visit customers and occasionally hear them say it has taken them a month to install a cluster.  I leave the room thinking that is crazy.  It should only take 32 mins to install a cluster.  Now I know what they mean.  What is interesting is how things can compound under your feet.  As we worked on the IdM solution (as you will see below) other unfortunate things like TLS cert expiration happened that we pushed off until IdM was solved, just like our customers do.  Fix the big thing before thinking about the small things.  But then you go into a day to day slide on the fix for the big thing and those other issues grow without your attention.  The fact it happened inside of our house with PMs we pay to know these things, means we unearthed something that needs to be addressed in the product.
       
      Starting back on Aug 5 we wanted to get our 5 OpenShift clusters plugged into the RHT SSO system for both authentication (login challenge) and authorization (group membership to RHT ldap).  For about 3 years we have carried this painfully awful RHT rover ldap group (not just a google group) called openshift-pm.  We add people to that group as they are hired into the group.  We wanted to role bind that openshift-pm ldap group to cluster-admin in all the clusters we build and destroy them.  
       
      The first thing we ran into was picking a technical solution.  As you can see below from the email from Tushar we picked the dev preview (never approved for productization) ACM IdM dex based solution.  We picked this because it was easy to configure via declarative gitOps policy, it did not require us to run a new database, and it allowed all the random clusters we created to call back to the hub cluster for authn/authz.  Turns out we picked the wrong horse.  
       
      The first sinkhole we ran into was something all our customers would run into.  The people running LDAP at RHT are in a different group than us and their work is governed by serviceNow tickets.  Turns out that for every cluster you want to have authz against our corporate LDAP service with group membership query you need to open a serviceNow ticket and give them the IPaddress of the connecting cluster as a governance policy here at RHT.  That was never going to scale with the coming and going of clusters and so we needed a hub for the approved LDAP caller to live on so that it could feed the other shorter lived clusters.  I would say from this experience, troubleshooting technical issues via serviceNow to another team (that many of our customers experience) for anything they are integrating OpenShift in their environment takes about 1.5-2 weeks.  
       
      Turns out the recent engineering re-org moved the people that were keeping the ACM IdM solution alive off to other projects.  Although we got it to work for authn, we never got the authz to an ldap group to work.  But now there was no one to ask for fixes.  So we reached out to the keycloak team to figure out how far along they were in the keycloak.x project.  They let us know they had successfully moved from wildfly to quarkus.  But were not able to move away from needing a database and had not been able to move to a custom resource definition/kubernetes configuration method.  There are development drops of this we could use now that will move to tech preview in Sept/Oct and GA next year around May, 2023.  But that even today's keycloak (that is GA'd now) will solve the other issue of being a hub that the ldap calls will come from.  So we have decided to move back to keycloak.  The fact that our only path forward is not as attractive as the open source dex based ACM IdM solution and has no path in the next 18 months of improving in the area of kube config and database requirements is not something that is great to hear, but we need to move forward with something the larger company's skillsets can help us with.  That is another choice our customers face that we experienced.  Our customers will go with what they house the skillset to maintain.      
       
      Currently we are manually testing the authn/authz from the central hub's OpenShift's oauth server to the corporate ldap and once that passes we want to move to using one of these keycloak dev preview builds.  Once we know that configuration, we will use ACM policy to give the random clusters that configuration.  But we have no ability to do so.  Nothing is documented in this regard from Red Hat.  My ask is that someone from the keycloak team and someone from the auth/serverAPI team can help us.  If the work is done with us watching from over your shoulders, the PM involved (Anand) can document the solution and get it out in our official product documentation.  Stanislav Laznicka has been great to work with from the auth/serverAPI team.  We have not reached out to anyone from the keycloak team yet.  All and all we have been spinning our wheels on this for a month, but it is now time for us to ask for more help.
       
      Thanks,
      Mike     
       
       
       
       
      On Wed, Aug 31, 2022 at 10:36 AM Tushar Katarki <tkatarki@redhat.com> wrote:

      Mike,  
      Summary: 
       
      The Hybrid Platforms PM team runs hybrid cloud OpenShift cluster setup in hub and spoke model. Our objective is to control authentication and RBAC for members of openshift-pm (corporate LDAP) for all those clusters with Red Hat SSO service. 
       
      We have been at yet and explored several different options (see below) and we have run into roadblocks and we have stalled. 
       
      We are asking for help from engineering to unblock us and make forward progress. 
       
      Also it is important to note that this is a common setup that our customers and Red Hat cloud services are likely to struggle with going forward. And therefore we need a working solution(s) for this. 
       
      High Level Requirements: 
       

      • Desired workflow at the end state is for members of openshift-pm group have CluterAdmin privileges on the hub and spoke clusters and authenticated non-members to have the defaults read access in these clusters. 
      • We want the above to be automated so that addition of new clusters dynamically ends in this above desired endstate on the newly added clusters. 
         
        Constraints:
         
      • Red Hat IT requires that each client using the RH-SSO service be registered with IT (CMDB etc) so that they have a record. Therefore any "automation" has to account for this and should not result in dozens of spoke clusters (that can come and go) to be registered with Red Hat IT for RH-SSO access. 
         
        Work so far: 
         
        -  ACM team Identity Management tech preview solution, which is based on Dex, offers a solution. It has been configured on the hub cluster and the service IdM endpoint registered as a client with RH-SSO. Furthermore IdM distributes the configuration of Identity to the spoke clusters. 
         
      • We have two challenges with the above A) we have run into issues with Group claims and how that is handled. More debugging is needed. B) However, the IdM solution seems to have been suspended and resources moved because of the recent reorg. 
         
      • We have spoken with the Keycloak engineering experts and it appears that Keycloak may offer the same functionality as above. However, it is not clear if they have the bandwidth and or priority to help with this. 
         
        The Ask: 
         
        What we are asking for is really a solution to our requirement above. 
         
        We should be trying to address our requirements with Keycloak since that is the direction. So we need engagement and help from Keycloak to help us with the setup on the Hybrid Platforms hub/spoke clusters. We think this should be one Keycloak engineer plus one OpenShift Auth engineer working together with a PM from our team for upto two to three weeks until success or a set of recommendations on gaps identified and fixes needed and next steps.  
         
        Thanks
        Tushar
        Tushar Katarki
        Director, OpenShift Product Management 
        Red Hat
        +1-978-618-6690 (M)
        US Eastern Time

      Assumptions

      • ...

      Customer Considerations

      • ...

      Documentation Considerations

      Questions to be addressed:

      • What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
      • Does this feature have doc impact?
      • New Content, Updates to existing content, Release Note, or No Doc Impact
      • If unsure and no Technical Writer is available, please contact Content Strategy.
      • What concepts do customers need to understand to be successful in [action]?
      • How do we expect customers will use the feature? For what purpose(s)?
      • What reference material might a customer want/need to complete [action]?
      • Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
      • What is the doc impact (New Content, Updates to existing content, or Release Note)?

            rhn-support-cstark Christian Stark
            scuppett@redhat.com Stephen Cuppett
            Christian Stark Christian Stark
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:

                Estimated:
                Original Estimate - 3 weeks
                3w
                Remaining:
                Remaining Estimate - 3 weeks
                3w
                Logged:
                Time Spent - Not Specified
                Not Specified