-
Feature
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
-
0% To Do, 0% In Progress, 100% Done
-
-
Feature Overview
In order to collect enough data for debugging and to improve scalability going forward, we need the code to produce both metrics and profiling data.
Metrics should be very detailed, such as timing for certain operations, network connections, authentication, etc etc. however they should not expose any sensitive data.
Goals
To provide metrics and insight into the runtime of both principal and agent components to aid with operations and troubleshooting.
Requirements
| Requirements | Notes | IS MVP |
| Both agent and principal expose a metrics server on a configurable TCP port | ||
| Metrics server can be enabled or disabled, with enabled being the default | ||
| Metrics are exported in a widely understood format such as Prometheus metric data | ||
| Installation manifests are exposing the metrics servers using services | ||
| Initial set of described metrics is implemented in the code | ||
| The go profiler (pprof) can be enabled or disabled, with disabled being the default | ||
| Documentation exist on which metrics exist and how to interpret them | ||
| Documentation exist on how to turn on/off the Go profiler and how to access it |
Use Cases
- Metrics will help customers to run and tune both, agent and principal components
- Metrics and profiling data will help engineering and customer support to troubleshoot production environments
Out of scope
- Enhanced metrics, introspection or telemetry such as OpenTelemtry is out of scope for this feature (to be handled in a different feature)
Dependencies
<Link or at least explain any known dependencies.>
Background, and strategic fit
<What does the person writing code, testing, documenting need to know?>
Assumptions
<Are there assumptions being made regarding prerequisites and dependencies?>
<Are there assumptions about hardware, software or people resources?>
Customer Considerations
<Are there specific customer environments that need to be considered (such as working with existing h/w and software)?>
Documentation/QE Considerations
<What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?>
<Does this feature have a doc impact? Possible values are: New Content, Updates to existing content, Release Note, or No Doc Impact?>
<Are there assumptions being made regarding prerequisites and dependencies?>
<Are there assumptions about hardware, software or people resources?>
Impact
<If the feature is ordered with other work, state the impact of this feature on the other work>
Related Architecture/Technical Documents
<links>
Definition of Ready
- The objectives of the feature are clearly defined and aligned with the business strategy.
- All feature requirements have been clearly defined by Product Owners.
- The feature has been broken down into epics.
- The feature has been stack ranked.
- Definition of the business outcome is in the Outcome Jira (which must have a parent Jira).
- …