-
Enhancement
-
Resolution: Done
-
Minor
-
None
-
2.4 GA
-
None
-
Not Started
-
Not Started
-
Not Started
-
Not Started
-
Not Started
-
Not Started
At the moment we are using the following labels on openshift objects created by 3scale AMP templates (example at https://github.com/3scale/3scale-amp-openshift-templates/blob/master/amp/amp-eval-tech-preview.yml#L3200) :
labels: 3scale.component: apicast 3scale.component-element: staging
Which are correct from kubernetes point of view: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set
Labels are key/value pairs. Valid label keys have two segments: an optional prefix and name, separated by a slash (/). The name segment is required and must be 63 characters or less, beginning and ending with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and alphanumerics between. The prefix is optional. If specified, the prefix must be a DNS subdomain: a series of DNS labels separated by dots (.), not longer than 253 characters in total, followed by a slash (/).
But are not valid from prometheus point of view: https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
The metric name specifies the general feature of a system that is measured (e.g. http_requests_total - the total number of HTTP requests received). It may contain ASCII letters and digits, as well as underscores and colons. It must match the regex [a-zA-Z_:][a-zA-Z0-9_:]*.
Label names may contain ASCII letters, numbers, as well as underscores. They must match the regex [a-zA-Z_][a-zA-Z0-9_]*. Label names beginning with __ are reserved for internal use.
And it is advisable using kubernetes labels that both complies kubernetes and prometheus syntax.
What test I have done to validate it
I added a service with label "3scale.component: system" (starting by digit "3" and containing a point "."):
apiVersion: v1 kind: Service metadata: name: system-sphinx-test-regex-exporter namespace: prometheus-exporters labels: 3scale.component: system app: sphinx-exporter environment: test name: system-sphinx-test-regex-exporter role: service template: sphinx-exporter tier: sphinx annotations: openshift.io/generated-by: OpenShiftNewApp prometheus.io/path: /metrics prometheus.io/port: '9247' prometheus.io/scrape: 'true' spec: ports: - protocol: TCP port: 9247 targetPort: 9247 selector: app: system-sphinx-exporter clusterIP: 172.30.205.220 type: ClusterIP sessionAffinity: None
This service is pointing to a sphinx prometheus exporter monitoring system sphinx instance, and this service is being scraped by internal cluster prometheus server using official job to auto discover services with prometheus metrics: https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/prometheus/prometheus-configmap.yaml#L60
- job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name
Which relabels obtained metrics with all openshift service labels (3scale.component among others). If I check that metric on the internal prometheus PromQL:
sphinx_up{3scale_component="system",app="sphinx-exporter",environment="test",instance="10.1.7.110:9247",job="kubernetes-service-endpoints",kubernetes_name="system-sphinx-test-regex-exporter",kubernetes_namespace="prometheus-exporters",name="system-sphinx-test-regex-exporter",role="service",template="sphinx-exporter",tier="sphinx"}
Among other labels, sphinx metrics are being relabelled with "3scale_component="system" (replacing "." by "_").
The problem comes with prometheus federation. In that specific case, there is a main prometheus server which is federated with internal production prometheus service running on production cluster (the one scraping openshift services), with the following job:
- job_name: k8s_prod honor_labels: true params: match[]: - '{job="kubernetes-nodes"}' - '{job="kubernetes-apiservers"}' - '{job="kubernetes-service-endpoints"}' - '{job="kubernetes-pods"}' scrape_interval: 2m scrape_timeout: 1m metrics_path: /federate scheme: http static_configs: - targets: - production-cluster-internal-prometheus-service.3scale.net:9090 relabel_configs: - separator: ; regex: (.*) target_label: cluster replacement: prod
Which receives all metrics from internal production prometheus service "production-cluster-internal-prometheus-service.3scale.net:9090", and adds a new label "cluster=prod" to all obtained metrics (to know from which cluster arrives).
Once these labelled metrics "3scale_component" arrives to main prometheus server, it gives the whole job as NodeDown (so all metrics that comes from that internal prometheus service are not received on main prometheus server).
NodeDown production-cluster-internal-prometheus-service.3scale.net:9090 (prod k8s_prod http://main-prometheus-service.3scale.net:9093 critical)
And we see the following errors on prometheus logs:
level=warn ts=2019-02-11T16:15:15.684940964Z caller=scrape.go:686 component="scrape manager" scrape_pool=k8s_prod target="http://production-cluster-internal-prometheus-service.3scale.net:9090/federate?match%5B%5D=%7Bjob%3D%22kubernetes-nodes%22%7D&match%5B%5D=%7Bjob%3D%22kubernetes-apiservers%22%7D&match%5B%5D=%7Bjob%3D%22kubernetes-service-endpoints%22%7D&match%5B%5D=%7Bjob%3D%22kubernetes-pods%22%7D" msg="append failed" err="no token found"
If we investigate that error "msg="append failed" err="no token found" " (for example on that google group https://groups.google.com/forum/#!msg/prometheus-users/5aGq7STP8TA/jn8QiCtmBwAJ), we will see that the problem is having metrics with not valid prometheus label names, which is confirmed by official documentation https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
My advise would be to use on Openshift, labels that are both kubernetes and prometheus compliant, so instead of using labels:
- 3scale.component
- 3scale.component-element
Use:
- threescale_component
- theescale_component_element