https://issues.redhat.com/browse/NETOBSERV-122 Liveness probe endpoint: $ oc exec -it goflow-kube-668c4d6d6d-v4v5h -- curl 127.0.0.1:8080/health/live | python -m json.tool { "status": "UP", "checks": [ { "name": "flows", "status": "UP", "data": { "host": "goflow-kube-668c4d6d6d-v4v5h" } } ] } Goflow-kube ready status: $ oc exec -it goflow-kube-668c4d6d6d-v4v5h -- curl 127.0.0.1:8080/health/ready | python -m json.tool { "status": "UP", "checks": [ { "name": "flows", "status": "UP", "data": { "host": "goflow-kube-668c4d6d6d-v4v5h" } } ] } $ oc exec -it goflow-kube-668c4d6d6d-v4v5h -- curl 127.0.0.1:8080/metrics | head # HELP flow_process_nf_count NetFlows processed. # TYPE flow_process_nf_count counter flow_process_nf_count{router="100.64.0.2",version="10"} 2930 flow_process_nf_count{router="100.64.0.3",version="10"} 2315 flow_process_nf_count{router="100.64.0.4",version="10"} 2715 flow_process_nf_count{router="100.64.0.5",version="10"} 1583 flow_process_nf_count{router="100.64.0.6",version="10"} 2200 flow_process_nf_count{router="100.64.0.7",version="10"} 2649 # HELP flow_process_nf_delay_summary_seconds NetFlows time difference between time of flow and processing. # TYPE flow_process_nf_delay_summary_seconds summary $ oc exec -it goflow-kube-668c4d6d6d-v4v5h -- curl 127.0.0.1:8080/metrics | tail flow_process_nf_templates_count{obs_domain_id="0",router="100.64.0.7",template_id="370",type="template",version="10"} 3 flow_process_nf_templates_count{obs_domain_id="0",router="100.64.0.7",template_id="371",type="template",version="10"} 3 flow_process_nf_templates_count{obs_domain_id="0",router="100.64.0.7",template_id="372",type="template",version="10"} 3 flow_process_nf_templates_count{obs_domain_id="0",router="100.64.0.7",template_id="373",type="template",version="10"} 3 flow_process_nf_templates_count{obs_domain_id="0",router="100.64.0.7",template_id="374",type="template",version="10"} 3 flow_process_nf_templates_count{obs_domain_id="0",router="100.64.0.7",template_id="375",type="template",version="10"} 3 flow_process_nf_templates_count{obs_domain_id="0",router="100.64.0.7",template_id="462",type="options_template",version="10"} 3 # HELP reader_record_enriched Number of records that have been successfully received and enriched. # TYPE reader_record_enriched counter reader_record_enriched 14458 Test when no flows are received: *Setup: deny all ingress policy for network-observability NS for app=goflow-kube kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: block-flows namespace: network-observability uid: 0035b8fd-8155-439e-922a-aec19eba5483 resourceVersion: '137384' generation: 1 creationTimestamp: '2022-02-09T19:47:40Z' managedFields: - manager: Mozilla operation: Update apiVersion: networking.k8s.io/v1 time: '2022-02-09T19:47:40Z' fieldsType: FieldsV1 fieldsV1: 'f:spec': 'f:podSelector': {} 'f:policyTypes': {} spec: podSelector: matchLabels: app: goflow-kube policyTypes: - Ingress When no flows are reported, still the status is reported as UP: $ oc exec -it $goflow -- curl 127.0.0.1:8080/health/live | python -m json.tool { "status": "UP", "checks": [ { "name": "flows", "status": "UP", "data": { "host": "goflow-kube-6ffb87cb7c-dj4hw" } } ] } memodi@memodi-mac:/Users/memodi/workspaces/repos/netobserv/network-observability-operator (main *$=) $ oc exec -it $goflow -- curl 127.0.0.1:8080/health/ready | python -m json.tool { "status": "UP", "checks": [ { "name": "flows", "status": "UP", "data": { "host": "goflow-kube-6ffb87cb7c-dj4hw" } } ] } memodi@memodi-mac:/Users/memodi/workspaces/repos/netobserv/network-observability-operator (main *$=) $ oc exec -it $goflow -- curl 127.0.0.1:8080/metrics # HELP reader_record_enriched Number of records that have been successfully received and enriched. # TYPE reader_record_enriched counter reader_record_enriched 0 ^^ stayed as long as policy was in effect, as soon as blocking deny all ingress policy was deleted, metrics started to show up: $ oc exec -it $goflow -- curl 127.0.0.1:8080/metrics # HELP flow_process_nf_errors_count NetFlows processed errors. # TYPE flow_process_nf_errors_count counter flow_process_nf_errors_count{error="template_not_found",router="100.64.0.2"} 16 flow_process_nf_errors_count{error="template_not_found",router="100.64.0.3"} 17 flow_process_nf_errors_count{error="template_not_found",router="100.64.0.4"} 21 flow_process_nf_errors_count{error="template_not_found",router="100.64.0.5"} 15 flow_process_nf_errors_count{error="template_not_found",router="100.64.0.6"} 10 flow_process_nf_errors_count{error="template_not_found",router="100.64.0.7"} 14 # HELP reader_record_enriched Number of records that have been successfully received and enriched. # TYPE reader_record_enriched counter reader_record_enriched 0 $ oc exec -it $goflow -- curl 127.0.0.1:8080/metrics | tail flow_process_nf_templates_count{obs_domain_id="0",router="100.64.0.7",template_id="370",type="template",version="10"} 1 flow_process_nf_templates_count{obs_domain_id="0",router="100.64.0.7",template_id="371",type="template",version="10"} 1 flow_process_nf_templates_count{obs_domain_id="0",router="100.64.0.7",template_id="372",type="template",version="10"} 1 flow_process_nf_templates_count{obs_domain_id="0",router="100.64.0.7",template_id="373",type="template",version="10"} 1 flow_process_nf_templates_count{obs_domain_id="0",router="100.64.0.7",template_id="374",type="template",version="10"} 1 flow_process_nf_templates_count{obs_domain_id="0",router="100.64.0.7",template_id="375",type="template",version="10"} 1 flow_process_nf_templates_count{obs_domain_id="0",router="100.64.0.7",template_id="462",type="options_template",version="10"} 1 # HELP reader_record_enriched Number of records that have been successfully received and enriched. # TYPE reader_record_enriched counter reader_record_enriched 1707 - What is the definition of status DOWN? Is it possible to have that invoked for testing? A: atm we don't report DOWN status, if go flow is in panic as result of some bug, requesting on /health/live will return connection refused to k8s and it will be interpreted as down. - Are these reported to prometheus? If so, how can we grab it from there? A: No, currently they're not reported to prometheus and will most likely change in future. - How to use HELP and TYPE expressions? A: They're used by prometheus to parse this metrics, details on how they're used: https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md#text-based-format