Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-4154

Support the standardized STS configuration flow via OLM and CCO for Cluster Logging

    • Standard STS config via CCO for OCP 4.14
    • False
    • None
    • False
    • Red
    • NEW
    • To Do
    • OCPSTRAT-127 - Continued STS enablement for selected OLM-managed operators
    • OCPSTRAT-127Continued STS enablement for selected OLM-managed operators
    • NEW
    • 0% To Do, 0% In Progress, 100% Done
    • If Release Note Needed, Set a Value
    • Log Collection - Sprint 241, Log Collection - Sprint 242

      Goals

      Establish a common and simplified configuration experience for Cluster Logging on STS-enabled clusters using the new, standardized configuration flow described in OCPBU-559. Users have a repeatable process to configure Cluster Logging for STS with well-known inputs and behavior and can reuse the knowledge about that process with other operators.

      Non-Goals

      Support for any older version of OCP than 4.14.

      Motivation

      Today, the support for AWS STS authentication is well established in our core platform but fragmented at best among our layered products and OLM-managed operators. The configuration experience is also different between individual OLM-managed operators that support STS. OCPBU-4 aims to solve this for all cloud providers using the CloudCredentialOperator (CCO) and its CredentialRequest API.

      Based on this, customers get a repeatable and simple experience of installing and configuring Cluster Logging, or any OLM-managed operator that supports it, for tokenized authentication with their cloud provider.

      Cluster Logging has been identified as one of the first critical operators to support that flow to act on customer feedback from ROSA and OSD customers.

      Alternatives

      None.

      Acceptance Criteria

      • Cluster Logging implements the standardized configuration flow for STS-enabled clusters using CCO and CredentialRequests described here: https://docs.google.com/document/d/1iFNpyycby_rOY1wUew-yl3uPWlE00krTgr9XHDZOTNo/edit#
      • Cluster Logging gracefully falls back to regular operations when no role ARN is provided
      • Cluster Logging degrades when the role ARN is provided but CCO does not reconcile the CredentialRequest (either due to a bug or due to running on an older than OCP 4.14 release)
      • Cluster Logging documents what specific IAM permissions are needed when integrating with AWS using STS and provides easy to consume instructions to create those
      • Cluster Logging supports this workflow and provides the documentation from 4.14 onwards

      Risk and Assumptions

      • Assumption: you don't currently have an existing way to integrate with STS
      • Risk: if the above assumption is wrong, you need to deprecate this configuration flow in favor of the flow defined in OCPBU-559

      Documentation Considerations

      • Cluster Logging should rely on documentation the OLM portion of the OCP docs on how to carry out the configuration flow using either the OCP console or the CLI
      • Cluster Logging in its own documentation section shall supply the required IAM credential instructions

      Open Questions

      Additional Notes

            [LOG-4154] Support the standardized STS configuration flow via OLM and CCO for Cluster Logging

            GitLab CEE Bot added a comment - CPaaS Service Account mentioned this issue in a merge request of openshift-logging / Log Collection Midstream on branch openshift-logging-5.9-rhel-9_ upstream _a29e46b818ae13409d88219d30852212 : Updated 2 upstream sources

            jamparke@redhat.com   Capturing our brief discussion along with one I had with cahartma@redhat.com here:

             

            Short term: Recommend removing STS annotation from operator bundle so UI role ARN textbox no longer appears during deployment

            Risk:  We no longer appear as an "STS enabled operator" even though we do support and document STS; the support is not in the manor as designed by the ROSA team.  The designed workflow is insufficient to support the possible uses of CLF.  You will need to weigh the value of us declaring STS support via the annotation but currently customers are confused by the textbox that is available to them.

             

            Long Term:  We have a meeting scheduled with managed services to identify their intended workflow.  cahartma@redhat.com and I passed around the idea of maybe we could:

            • Store the ARN and default its usage to outputs that do not define a secret or apply it when they define a specially named secret

            Risk:  Multi CLF is supported now so how does a "default" ARN play into it here?  Is it OK that all forwarders could potentially use the same ARN?  Do we limit its use to a specific NS or even only the legacy instance of CLF?

             

            Lastly I would propose we drop this epic from 5.9.0 and potentially deliver it in a 5.8 and/or 5.9 z-stream once we have a better understanding of the desired outcome

            Jeffrey Cantrill added a comment - jamparke@redhat.com    Capturing our brief discussion along with one I had with cahartma@redhat.com here:   Short term: Recommend removing STS annotation from operator bundle so UI role ARN textbox no longer appears during deployment Risk:  We no longer appear as an "STS enabled operator" even though we do support and document STS; the support is not in the manor as designed by the ROSA team.  The designed workflow is insufficient to support the possible uses of CLF.  You will need to weigh the value of us declaring STS support via the annotation but currently customers are confused by the textbox that is available to them.   Long Term:  We have a meeting scheduled with managed services to identify their intended workflow.   cahartma@redhat.com and I passed around the idea of maybe we could: Store the ARN and default its usage to outputs that do not define a secret or apply it when they define a specially named secret Risk:  Multi CLF is supported now so how does a "default" ARN play into it here?  Is it OK that all forwarders could potentially use the same ARN?  Do we limit its use to a specific NS or even only the legacy instance of CLF?   Lastly I would propose we drop this epic from 5.9.0 and potentially deliver it in a 5.8 and/or 5.9 z-stream once we have a better understanding of the desired outcome

            DanielMesser  this issue was moved back to "Planning" status.    It was agreed in our last meeting that the CLO would not be able to deploy default instances of our operands, based upon a ROLE_ARN entered during operator install.  We are currently awaiting a revised design and AC, in order to accommodate the complexity of our logging stack. 

            We are happy to work together on a revised solution that considers the complex configuration of our logging stack.    For now we will work to implement the first bullet above, in order to accommodate the new CCO credRequest functionality.

            (slack thread)
            cc: jamparke@redhat.com 

            Casey Hartman added a comment - DanielMesser   this issue was moved back to "Planning" status.    It was agreed in our last meeting that the CLO would not be able to deploy default instances of our operands, based upon a ROLE_ARN entered during operator install.  We are currently awaiting a revised design and AC, in order to accommodate the complexity of our logging stack.  We are happy to work together on a revised solution that considers the complex configuration of our logging stack.    For now we will work to implement the first bullet above, in order to accommodate the new CCO credRequest functionality. ( slack thread ) cc: jamparke@redhat.com  

            Looks like this missed 4.14. Please make an effort to get this in asap. Moving this to OCPSTRAT-127 as OCPSTRAT-235 is closed now.

            Daniel Messer added a comment - Looks like this missed 4.14. Please make an effort to get this in asap. Moving this to OCPSTRAT-127 as OCPSTRAT-235 is closed now.

            lgallett btofelrh   Based on our slack conversation, we are still awaiting details for changes to this epic, since it doesn't apply to our operator.   

            Casey Hartman added a comment - lgallett btofelrh    Based on our slack conversation, we are still awaiting details for changes to this epic, since it doesn't apply to our operator.   

            dageoffr  Now that we've had the separate discussions, including with jaharrin , we've agreed that this epic only applies to operators and thus we cannot complete the acceptance criteria exactly as stated.   A few we've already completed with our initial STS feature.   How best do we modify this, to make logging STS (forwarding to cloudwatch) align with the intent of the epic?    I've listed possible updates above, as well as:  

            • We will modify our docs and flow to accommodate the new CCO reconciling CredRequests.  As apposed to us currently asking them to create a CredRequest and create a secret yaml file using ccoctl, with the new flow, the CCO will create the secret for the user.   Assumption: We will then take ownership of that secret, in order to reconcile as part of the forwarding config.   
            • It is agreed (at this time) that our ClusterLoggingOperator install, is not capable of deploying default instances of our ClusterLogging CR, AND ClusterLogForwarder CR... using a default ROLE_ARN that is entered upon Operator install.   (this was suggested as a possible solution)  That flow is not included in this epic and was never scheduled for design or dev.    What are the next steps?

            Casey Hartman added a comment - dageoffr   Now that we've had the separate discussions, including with jaharrin , we've agreed that this epic only applies to operators and thus we cannot complete the acceptance criteria exactly as stated.   A few we've already completed with our initial STS feature.   How best do we modify this, to make logging STS (forwarding to cloudwatch) align with the intent of the epic?    I've listed possible updates above, as well as:   We will modify our docs and flow to accommodate the new CCO reconciling CredRequests.  As apposed to us currently asking them to create a CredRequest and create a secret yaml file using ccoctl, with the new flow, the CCO will create the secret for the user.   Assumption: We will then take ownership of that secret, in order to reconcile as part of the forwarding config.    It is agreed (at this time) that our ClusterLoggingOperator install, is not capable of deploying default instances of our ClusterLogging CR, AND ClusterLogForwarder CR... using a default ROLE_ARN that is entered upon Operator install.   (this was suggested as a possible solution)  That flow is not included in this epic and was never scheduled for design or dev.    What are the next steps?

            dageoffr  Revising the todo list:

            "We aim to automate and reduce those steps as much as possible to make installing operators (via OLM) simpler on clusters where TAT authentication is supported"

            Our cluster logging operator does not interact with or authenticate with AWS during install.   Our only TAT/STS interaction is currently when the user configures forwarding to Cloudwatch.  

            "we are introducing a new method for semi-automating by adding to CCO’s capabilities which expands the use of CredentialsRequest to request the creation of Secrets that contain the information needed for STS flows.
            The recommended way is to provide a CredentialsRequest with correctly filled STS-related fields and let the CCO create the Secret for you.

            We have no action items on this, since our operator is not part of the auth.  The customer needs a role at AWS, and the permissions and trust policy to send logs to Cloudwatch as that role.   The logcollector service account will assume that role.

            • If they configure the role at AWS, the user only needs to do is create a secret in OCP.  The CLO does the rest.  
            • If they want to use the cco utility to set it all up at once, we provide simple instructions to follow.
            • There is no value in creating a credential request for them, nor in providing a template since still requires them to submit that to AWS using the cco utility

             
            Possible things we can do to improve user flow: 

            • Updated docs to help with role/permissions creation at AWS??
            • Ensure our bundle is annotated with token-based auth support (even though its not our operator, we don't want to be excluded)
            • Provide a more robust error and status handling for auth issues, providing possible remedy steps.
            • Add a template or other console/CLI docs to indicate the required permissions for the AWS credential request (still requires using cco utility)
            • Once a role is created, we could have the CLO create the required trust and permissions policy, based on the secret the user creates.

             

             

            Casey Hartman added a comment - dageoffr   Revising the todo list: "We aim to automate and reduce those steps as much as possible to make installing operators (via OLM) simpler on clusters where TAT authentication is supported" Our cluster logging operator does not interact with or authenticate with AWS during install.   Our only TAT/STS interaction is currently when the user configures forwarding to Cloudwatch.   "we are introducing a new method for semi-automating by adding to CCO’s capabilities which expands the use of CredentialsRequest to request the creation of Secrets that contain the information needed for STS flows. The recommended way is to provide a CredentialsRequest with correctly filled STS-related fields and let the CCO create the Secret for you. We have no action items on this, since our operator is not part of the auth.  The customer needs a role at AWS, and the permissions and trust policy to send logs to Cloudwatch as that role.   The logcollector service account will assume that role. If they configure the role at AWS, the user only needs to do is create a secret in OCP.  The CLO does the rest.   If they want to use the cco utility to set it all up at once, we provide simple instructions to follow. There is no value in creating a credential request for them, nor in providing a template since still requires them to submit that to AWS using the cco utility   Possible things we can do to improve user flow:  Updated docs to help with role/permissions creation at AWS?? Ensure our bundle is annotated with token-based auth support (even though its not our operator, we don't want to be excluded) Provide a more robust error and status handling for auth issues, providing possible remedy steps. Add a template or other console/CLI docs to indicate the required permissions for the AWS credential request (still requires using cco utility) Once a role is created, we could have the CLO create the required trust and permissions policy, based on the secret the user creates.    

            Casey Hartman added a comment - - edited

            jamparke@redhat.com   I should be able to get this working for our 5.8 release.  Here's what I understand after some investigating....

            Current

            Our cloudwatch forwarding documentation currently references the CCO utility and takes the user through the following:

            1. Create a CredentialRequest yaml file (we tell them what permissions are necessary)
            2. Use the CCO utility to create the role, specifying the identity-provider-arn of the cluster. This outputs the credentials (secret) yaml file.
            3. Apply the secret
            4. Create or update the forwarder CR

            Within the operator, we are "identifying" an sts cluster, based solely on the format of the secret they provide. Currently if they include a role_arn (or credentials.role_arn) key in the secret, then we project the collector's service account token. This will be the flow that needs to change somewhat with the new CCO feature.

            TODO:

            1. We need to create a Console flow when a forwarder is created that requires a role to be configured (cloudwatch)
            2. Modify the CLI flow to accommodate the CCO new feature (creating a secret directly from a credential request)
            3. Check for missing secret and provide follow-up instructions for creating the secret.

            TESTS:

            1. Ensure we degrade status when the role ARN is provided but CCO does not reconcile the CredentialRequest

            Casey Hartman added a comment - - edited jamparke@redhat.com    I should be able to get this working for our 5.8 release.  Here's what I understand after some investigating.... Our operator has no direct interactions with the cloud API Our forwarding to default ES, requires no changes since we have no authentication with AWS. Forwarding to default Loki currently only functions using static keys, when using the aws type of secret. https://docs.openshift.com/container-platform/4.13/logging/cluster-logging-loki.html#logging-loki-deploy_cluster-logging-loki   This will have no flow change until we implement sts auth as part of LokiStack storage component (not sure where that lands or how it will work yet). Forwarding to Cloudwatch will be the only feature that is impacted currently Current Our cloudwatch forwarding documentation currently references the CCO utility and takes the user through the following: Create a CredentialRequest yaml file (we tell them what permissions are necessary) Use the CCO utility to create the role, specifying the identity-provider-arn of the cluster. This outputs the credentials (secret) yaml file. Apply the secret Create or update the forwarder CR Within the operator, we are "identifying" an sts cluster, based solely on the format of the secret they provide. Currently if they include a role_arn (or credentials.role_arn) key in the secret, then we project the collector's service account token. This will be the flow that needs to change somewhat with the new CCO feature. TODO: We need to create a Console flow when a forwarder is created that requires a role to be configured (cloudwatch) Modify the CLI flow to accommodate the CCO new feature (creating a secret directly from a credential request) Check for missing secret and provide follow-up instructions for creating the secret. TESTS: Ensure we degrade status when the role ARN is provided but CCO does not reconcile the CredentialRequest

            Any update on if this is on track for the OCP 4.14 release?

            Daniel Geoffroy (Inactive) added a comment - Any update on if this is on track for the OCP 4.14 release?

            Apologees Jeff.  I dont mean to intrude, this is just being tracked way higher up and we are looking for a signal that the Mobb list for AWS managed identity is on track to meet the deadlines around the OCP 4.14 release.  This along with similar adoption by a few other layered product teams are on the short list of critical operators that need to be present and are being closely reviewed by RH leadership.  The fact that this is unassigned and not yet broken down was mentioned as a concern and I was simply trying to help clean up some of the Jira hygiene where I thought it was possible. 

            Please confirm if this is on track to be in a release that coincides with the OCP 4.14 release and deployment into dedicated. 

            Daniel Geoffroy (Inactive) added a comment - Apologees Jeff.  I dont mean to intrude, this is just being tracked way higher up and we are looking for a signal that the Mobb list for AWS managed identity is on track to meet the deadlines around the OCP 4.14 release.  This along with similar adoption by a few other layered product teams are on the short list of critical operators that need to be present and are being closely reviewed by RH leadership.  The fact that this is unassigned and not yet broken down was mentioned as a concern and I was simply trying to help clean up some of the Jira hygiene where I thought it was possible.  Please confirm if this is on track to be in a release that coincides with the OCP 4.14 release and deployment into dedicated. 

              cahartma@redhat.com Casey Hartman
              DanielMesser Daniel Messer
              Anping Li Anping Li
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated: