Uploaded image for project: 'Red Hat 3scale API Management'
  1. Red Hat 3scale API Management
  2. THREESCALE-1905

Update 3scale labels used on AMP templates to be prometheus compliant

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Done
    • Icon: Minor Minor
    • None
    • 2.4 GA
    • None
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Not Started

      At the moment we are using the following labels on openshift objects created by 3scale AMP templates (example at https://github.com/3scale/3scale-amp-openshift-templates/blob/master/amp/amp-eval-tech-preview.yml#L3200) :

          labels:
            3scale.component: apicast
            3scale.component-element: staging
      

      Which are correct from kubernetes point of view: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set

      Labels are key/value pairs. Valid label keys have two segments: an optional prefix and name, separated by a slash (/). The name segment is required and must be 63 characters or less, beginning and ending with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and alphanumerics between. The prefix is optional. If specified, the prefix must be a DNS subdomain: a series of DNS labels separated by dots (.), not longer than 253 characters in total, followed by a slash (/).
      

      But are not valid from prometheus point of view: https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

      The metric name specifies the general feature of a system that is measured (e.g. http_requests_total - the total number of HTTP requests received). It may contain ASCII letters and digits, as well as underscores and colons. It must match the regex [a-zA-Z_:][a-zA-Z0-9_:]*.
      
      Label names may contain ASCII letters, numbers, as well as underscores. They must match the regex [a-zA-Z_][a-zA-Z0-9_]*. Label names beginning with __ are reserved for internal use.
      

      And it is advisable using kubernetes labels that both complies kubernetes and prometheus syntax.

      What test I have done to validate it

      I added a service with label "3scale.component: system" (starting by digit "3" and containing a point "."):

      apiVersion: v1
      kind: Service
      metadata:
        name: system-sphinx-test-regex-exporter
        namespace: prometheus-exporters
        labels:
          3scale.component: system
          app: sphinx-exporter
          environment: test
          name: system-sphinx-test-regex-exporter
          role: service
          template: sphinx-exporter
          tier: sphinx
        annotations:
          openshift.io/generated-by: OpenShiftNewApp
          prometheus.io/path: /metrics
          prometheus.io/port: '9247'
          prometheus.io/scrape: 'true'
      spec:
        ports:
          - protocol: TCP
            port: 9247
            targetPort: 9247
        selector:
          app: system-sphinx-exporter
        clusterIP: 172.30.205.220
        type: ClusterIP
        sessionAffinity: None
      

      This service is pointing to a sphinx prometheus exporter monitoring system sphinx instance, and this service is being scraped by internal cluster prometheus server using official job to auto discover services with prometheus metrics: https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/prometheus/prometheus-configmap.yaml#L60

      - job_name: 'kubernetes-service-endpoints'
        kubernetes_sd_configs:
        - role: endpoints
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_name
      

      Which relabels obtained metrics with all openshift service labels (3scale.component among others). If I check that metric on the internal prometheus PromQL:

      sphinx_up{3scale_component="system",app="sphinx-exporter",environment="test",instance="10.1.7.110:9247",job="kubernetes-service-endpoints",kubernetes_name="system-sphinx-test-regex-exporter",kubernetes_namespace="prometheus-exporters",name="system-sphinx-test-regex-exporter",role="service",template="sphinx-exporter",tier="sphinx"}
      

      Among other labels, sphinx metrics are being relabelled with "3scale_component="system" (replacing "." by "_").

      The problem comes with prometheus federation. In that specific case, there is a main prometheus server which is federated with internal production prometheus service running on production cluster (the one scraping openshift services), with the following job:

      - job_name: k8s_prod
        honor_labels: true
        params:
          match[]:
          - '{job="kubernetes-nodes"}'
          - '{job="kubernetes-apiservers"}'
          - '{job="kubernetes-service-endpoints"}'
          - '{job="kubernetes-pods"}'
        scrape_interval: 2m
        scrape_timeout: 1m
        metrics_path: /federate
        scheme: http
        static_configs:
        - targets:
          - production-cluster-internal-prometheus-service.3scale.net:9090
        relabel_configs:
        - separator: ;
          regex: (.*)
          target_label: cluster
          replacement: prod
      

      Which receives all metrics from internal production prometheus service "production-cluster-internal-prometheus-service.3scale.net:9090", and adds a new label "cluster=prod" to all obtained metrics (to know from which cluster arrives).

      Once these labelled metrics "3scale_component" arrives to main prometheus server, it gives the whole job as NodeDown (so all metrics that comes from that internal prometheus service are not received on main prometheus server).

      NodeDown production-cluster-internal-prometheus-service.3scale.net:9090 (prod k8s_prod http://main-prometheus-service.3scale.net:9093 critical)
      

      And we see the following errors on prometheus logs:

      level=warn ts=2019-02-11T16:15:15.684940964Z caller=scrape.go:686 component="scrape manager" scrape_pool=k8s_prod target="http://production-cluster-internal-prometheus-service.3scale.net:9090/federate?match%5B%5D=%7Bjob%3D%22kubernetes-nodes%22%7D&match%5B%5D=%7Bjob%3D%22kubernetes-apiservers%22%7D&match%5B%5D=%7Bjob%3D%22kubernetes-service-endpoints%22%7D&match%5B%5D=%7Bjob%3D%22kubernetes-pods%22%7D" msg="append failed" err="no token found"
      

      If we investigate that error "msg="append failed" err="no token found" " (for example on that google group https://groups.google.com/forum/#!msg/prometheus-users/5aGq7STP8TA/jn8QiCtmBwAJ), we will see that the problem is having metrics with not valid prometheus label names, which is confirmed by official documentation https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

      My advise would be to use on Openshift, labels that are both kubernetes and prometheus compliant, so instead of using labels:

      • 3scale.component
      • 3scale.component-element

      Use:

      • threescale_component
      • theescale_component_element

            mkudlej@redhat.com Martin Kudlej
            slopezma@redhat.com Sergio Lopez
            Martin Kudlej Martin Kudlej
            Miguel Soriano Miguel Soriano
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: