• Icon: Epic Epic
    • Resolution: Obsolete
    • Icon: Critical Critical
    • None
    • None
    • Installer Core
    • Installer Telemetry
    • 5
    • installer
    • Done
    • OCPSTRAT-97 - Installer Telemetry Collection
    • OCPSTRAT-97Installer Telemetry Collection
    • 23% To Do, 0% In Progress, 77% Done

      Goal:

      As a product owner for the installer, I want insight into how users run the installer and their rate of success.

      Problem:

      Clusters donโ€™t provide any telemetry until they are fully installed. If a user encounters an error during this process, we wonโ€™t know about it unless they notify us. This makes it very difficult to understand the global success rate for installation.

      This is specifically not about the resultant cluster and therefore none of these metrics should be tied to a cluster-id, instead we'll have to create an identifier that tracks a specific installation workflow.

      Why is this important:

      In order to improve the product and improve the rate of success, we need to be able to measure that rate of success and understand what errors are encountered.

      Dependencies (internal and external):

      • Service Delivery needs to create and run the telemetry-gathering service

      Previous Work:

      • Telemeter

      Customers:

      Specifics unknown

       

      Open Questions

      • Compile Examples of data we'd like to send back
      • Confirm whether Telemetry can receive reports or only time series data

            [CORS-1287] Collect telemetry from installer

            THis might be helpful. CCX team recently worked with Assisted Installer team to allow them to upload data and process them thru the CCX pipeline. WOrk is tracked here - https://issues.redhat.com/browse/CCX-220

            Radek Vokal added a comment - THis might be helpful. CCX team recently worked with Assisted Installer team to allow them to upload data and process them thru the CCX pipeline. WOrk is tracked here - https://issues.redhat.com/browse/CCX-220

            Moving the target release now to 4.12.

            Marcos Entenza Garcia added a comment - Moving the target release now to 4.12.

            Changes for 4.11 planning.
            The Openshift 4.11 release number has been removed from Fix Version and has been set for Target version.
            PM will set the Target Version to indicate the request/ask
            Engineering will set the Fix Version once they are ready to commit to the epic.

            Nicole Wilker added a comment - Changes for 4.11 planning. The Openshift 4.11 release number has been removed from Fix Version and has been set for Target version. PM will set the Target Version to indicate the request/ask Engineering will set the Fix Version once they are ready to commit to the epic.

            Jan Holecek mentioned this issue in a merge request of ccx / ccx-ocp-core on branch install-gather:

            Add support for install gathers for troubleshooting failed installations.

            Nomination Bot added a comment - Jan Holecek mentioned this issue in a merge request of ccx / ccx-ocp-core on branch install-gather : Add support for install gathers for troubleshooting failed installations.

            Yang Yang added a comment -

            If I understand correctly, installer only pushes the metrics to pushgateway and it does not care about how/what prometheus scrapes them. Could anyone help me understand how users get the metrics?

            1. If the metrics are pushed to user local pushgateway, would in-cluster prometheus get the metrics? If that's true, how users configure the in-cluster prometheus to scrape from that pushgateway
            2. If the metrics are pushed the default public pushgateway, would any public prometheus get the metrics?

            Yang Yang added a comment - If I understand correctly, installer only pushes the metrics to pushgateway and it does not care about how/what prometheus scrapes them. Could anyone help me understand how users get the metrics? If the metrics are pushed to user local pushgateway, would in-cluster prometheus get the metrics? If that's true, how users configure the in-cluster prometheus to scrape from that pushgateway If the metrics are pushed the default public pushgateway, would any public prometheus get the metrics?

            Adding doc-ack label since there is no doc impact for this epic. 

            Stephanie Stout added a comment - Adding doc-ack label since there is no doc impact for this epic. 

            Rom Freiman added a comment -

            what's the plan? According to the enhancement?

            Rom Freiman added a comment - what's the plan? According to the enhancement?

            During the readout call today we discussed staffing challenges are the primary risk for this epic.  From CEE's perspective this is equally as important as the bootstrap debugging epic: https://issues.redhat.com/browse/CORS-1510

            Brenton Leanhardt added a comment - During the readout call today we discussed staffing challenges are the primary risk for this epic.  From CEE's perspective this is equally as important as the bootstrap debugging epic: https://issues.redhat.com/browse/CORS-1510

            rhn-support-edrich, do you have a document that describes the metrics you mentioned on the readout?

            Brenton Leanhardt added a comment - rhn-support-edrich , do you have a document that describes the metrics you mentioned on the readout?

            This feels ted to me since Patrick will be on leave most of the 4.7 release.  Happy to discuss.

            Brenton Leanhardt added a comment - This feels ted to me since Patrick will be on leave most of the 4.7 release.  Happy to discuss.

              Unassigned Unassigned
              rhn-coreos-acrawfor Alex Crawford (Inactive)
              Jianli Wei Jianli Wei
              Votes:
              1 Vote for this issue
              Watchers:
              27 Start watching this issue

                Created:
                Updated:
                Resolved: