-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
Logging 5.0
-
False
-
False
-
NEW
-
NEW
-
Undefined
-
The OpenShift Logging(EFK) stack is user workload at the ROSA.
But the FluentdNodeDown critical alert is always firing due to not collected the required metrics through user workload prometheus.
// The message from Prometheus operator in "openshift-user-workload-monitoring" project.
level=warn ts=2021-07-02T02:39:00.701373761Z caller=operator.go:1675 component=prometheusoperator msg="skipping servicemonitor" error="it accesses file system via tls config which Prometheus specification prohibits" servicemonitor=openshift-logging/fluentd namespace=openshift-user-workload-monitoring prometheus=user-workload
Because the fluentd Service monitor tls config invalid at the user workload prometheus as follows.
// Look why above message is shown, the fluentd servicemonitor tlsconfig does not met by the following conditions.
https://github.com/openshift/prometheus-operator/blob/ce7d979635b9d1210db48d54485bc924aed37cdb/pkg/prometheus/operator.go#L1964-L1966
if tlsConf.CAFile != "" || tlsConf.CertFile != "" || tlsConf.KeyFile != "" { return errors.New("it accesses file system via tls config which Prometheus specification prohibits") }
Version-Release number of selected component (if applicable):
On ROSA(4.7.z), OpenShift Logging 5.0 (EFK)
How reproducible:
You can reproduce this issue as installing OpenShift Logging on ROSA
Or, you can also reproduce this issue on OCPv4.7.z as OpenShift Logging install without "openshift.io/cluster-monitoring" label in "openshift-logging".
You can see the "FluentdNodeDown" critical alert would be firing within 10 mins.
Actual results:
As always "FluentdNodeDown" critical alert is firing even though the all fluentd pods are up and running without issues due to not collecting required metrics by invalid tls config at the fluentd servicemonitor.
Expected results:
OpenShift Logging(EFK) stack should provide valid tls configs for fluentd ServiceMonitor in order to collect the metrics by user workload promehtues. And it can also suppress incorrect "FluentdNodeDown" alert.
Additional info:
I've verified if the fluentd servicemonitor with valid tls config(even though just ignoring tls config) works well as follows.
1. For testing, firstly stop the cluster-logging-operator.
2. Modify the fluentd servicemonitor tls config(in tlsConfig: section) as follows.
:
spec:
endpoints:
- bearerTokenSecret:
key: ""
path: /metrics
port: metrics
scheme: https
tlsConfig:
insecureSkipVerify: true
serverName: fluentd.openshift-logging.svc
jobLabel: monitor-fluentd
namespaceSelector:
matchNames:
- openshift-logging
selector:
matchLabels:
logging-infra: support
3. Check if the fluentd metrics are collected by user workload prometheus.
$ oc rsh -n openshift-user-workload-monitoring -c prometheus prometheus-user-workload-1 \ curl 'http://localhost:9090/api/v1/query?query=up%7Bjob%3D"fluentd"%7D+%3D%3D+1' | jq . { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "up", "container": "fluentd", "endpoint": "metrics", "instance": "10.128.0.7:24231", "job": "fluentd", "namespace": "openshift-logging", "pod": "fluentd-5rnpl", "service": "fluentd" }, "value": [ 1625812558.084, "1" ] }, :
- account is impacted by
-
LOG-1564 Cluster Logging Operator should be coupled loosely with Cluster Monitoring Prometheus in "openshift-logging" on ROSA
- Closed
- links to