-
Bug
-
Resolution: Done
-
Major
-
Logging 5.5.0
-
False
-
None
-
False
-
NEW
-
VERIFIED
-
Log Collection - Sprint 221, Log Collection - Sprint 222
Version of the components:
$ oc get csv NAME DISPLAY VERSION REPLACES PHASE clusterlogging.v5.5.0 Red Hat OpenShift Logging 5.5.0 Succeeded elasticsearch-operator.v5.5.0 OpenShift Elasticsearch Operator 5.5.0 Succeeded loki-operator.v0.0.1 Loki Operator 0.0.1 Succeeded Server Version: 4.10.0-0.nightly-2022-06-08-150219 Kubernetes Version: v1.23.5+3afdacb
Description of the problem:
When forwarding logs to a Lokistack instance, the Vector's healthcheck is failing but the logs are being sent to the Lokistack.
$ oc rsh collector-lswpd Defaulted container "collector" out of: collector, logfilesmetricexporter sh-4.4# vector validate /etc/vector/vector.toml Loaded with warnings ["/etc/vector/vector.toml"] ------------------------------------------------ ~ Transform "route_container_logs._unmatched" has no consumers2022-06-24T05:34:32.329364Z INFO vector::sources::kubernetes_logs: Obtained Kubernetes Node name to collect logs for (self). self_node_name="ip-10-0-223-91.us-east-2.compute.internal" 2022-06-24T05:34:32.337961Z INFO vector::sources::kubernetes_logs: Excluding matching files. exclude_paths=["/var/log/pods/openshift-logging_collector-*/*/*.log", "/var/log/pods/openshift-logging_elasticsearch-*/*/*.log", "/var/log/pods/openshift-logging_kibana-*/*/*.log"] √ Component configuration 2022-06-24T05:34:32.379154Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=A non-successful status returned: 404 Not Found component_kind="sink" component_type="loki" component_id=loki_infra component_name=loki_infra x Health check for "loki_infra" failed 2022-06-24T05:34:32.379248Z INFO vector::topology::builder: Healthcheck: Passed. √ Health check "prometheus_output" sh-4.4#
Steps to reproduce the issue:
1 Install the Logging, Loki and Elasticsearch operators build from the master branch and create a Lokistack instance.
2 Create secret with token and CA bundle requried for connecting to the Lokistack instance. Token kept empty due to bug https://issues.redhat.com/browse/LOG-2461
oc -n openshift-logging create secret generic lokistack-gateway-bearer-token --from-literal=token="" --from-literal=ca-bundle.crt="$(oc -n openshift-logging get cm lokistack-instance-ca-bundle -o json | jq -r '.data."service-ca.crt"')"
3 Create the ClusterRole and ClusterRoleBinding required for the logcollector SA.
--- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: lokistack-instance-tenant-logs rules: - apiGroups: - 'loki.grafana.com' resources: - application - infrastructure - audit resourceNames: - logs verbs: - 'get' - 'create' --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: lokistack-instance-tenant-logs roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: lokistack-instance-tenant-logs subjects: - kind: ServiceAccount name: logcollector namespace: openshift-logging
4 Create CLF instance.
apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
outputs:
- name: loki-infra
type: loki
url: https://lokistack-instance-gateway-http.openshift-logging.svc:8080/api/logs/v1/infrastructure/
secret:
name: lokistack-gateway-bearer-token
pipelines:
- name: send-infra-logs
inputRefs:
- infrastructure
outputRefs:
- loki-infra
5 Create ClusterLogging instance.
apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" spec: managementState: "Managed" collection: logs: type: "vector" vector: {}
6 Add token to the Vector config.
Set ClusterLogging instance to Unmanaged. oc edit clusterloggings.logging.openshift.io instance oc extract secret/collector-config --confirm oc sa get-token logcollector vi vector.toml [sinks.loki_infra] type = "loki" inputs = ["send-infra-logs"] endpoint = "https://lokistack-instance-gateway-http.openshift-logging.svc:8080/api/logs/v1/infrastructure/" [sinks.loki_infra.encoding] codec = "json" [sinks.loki_infra.labels] kubernetes_container_name = "{{kubernetes.container_name}}" kubernetes_host = "${VECTOR_SELF_NODE_NAME}" kubernetes_namespace_name = "{{kubernetes.namespace_name}}" kubernetes_pod_name = "{{kubernetes.pod_name}}" log_type = "{{log_type}}"# TLS Config [sinks.loki_infra.tls] enabled = true ca_file = "/var/run/ocp-collector/secrets/lokistack-gateway-bearer-token/ca-bundle.crt" [sinks.loki_infra.auth] strategy = "bearer" token = "eyJhbGciOiJSUzI1NiIsImtpZCI6Im1ucE9EQXlkSUgwVU9CQ2tTVUhRcWRwdm9LbzYxTlZqalVBMWFWUEVrbU0ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbG9nZ2luZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJsb2djb2xsZWN0b3ItdG9rZW4tYnRsbDUiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibG9nY29sbGVjdG9yIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNjBjZGZmZTktZWZlZC00MzQxLWIzZTctYzIwMDllYThjZmQwIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1sb2dnaW5nOmxvZ2NvbGxlY3RvciJ9.ZWvWFZVxNkEFGzKyyNCTMm1-0DDDS6edKnHyfW0zyNSMKgg7NeSGlgcxGnLCPvRmuhWw2u6bzxeAyrwaRF5kRJI0J-OyR6-WLQKU9hbeSXSDowmOwiu1wxwMgFnCsr85bGjmyVoP0niBk4sxsrGLDPL2vEDuLplMppasfKmQlX1CrEElbMCW8GYKRjB1DgedUVAQUTqgaMDmBBDzedtm_BRTjKr8-tKhv_CB4K3oHH9cIhbiNtC1Nrvc3uAKYQc37UKqJjVnKL6R0_0Ms1IwksLD4s4FEeTfZiuWXnqTyoLXUvGOpfJvGFmZgTZXUgB4HcW4Zwj8eyDDtANrsQo7_AajizwVC1ty1HYFW7gkMzaSQXkmtNBUFptT4vwoM2sR2RItH1Y4GWn92qh9NEdXKBSu4ZewyXjU2b0ckR6dS-Y7B5Gzlld-ciQdaX_9_aE0LV9wU3ppG8VHY3fY2mV93h0mEf_qrMkf6fSzB_jvLaDaxHeI8IEjV_Six6XzAwR65Uv3HRfP9ghcLNPQsskmyvXObUDy-LIQ9AHTyPxVolOXoHoTY5PY0yoV3rmSHYCz_PbzWs67A0M98wsm0eahaSFeMD8tiDTMeKoC4uZn5SeFk6PXBb0M_7tYGyMgdEH-zVN7cGJASRZqHL-PtaJC-BgyszckSUOxgD1QVqoMzHA" oc set data secret/collector-config --from-file vector.toml oc delete pods --selector=component=collector
7 Check the collector logs and validate the Vector config.
$ oc logs collector-psz7t Defaulted container "collector" out of: collector, logfilesmetricexporter 2022-06-24T05:32:45.320903Z INFO vector::app: Log level is enabled. level="info" 2022-06-24T05:32:45.321123Z INFO vector::app: Loading configs. paths=["/etc/vector/vector.toml"] 2022-06-24T05:32:45.333953Z WARN vector::config::loading: Transform "route_container_logs._unmatched" has no consumers 2022-06-24T05:32:45.334146Z INFO vector::sources::kubernetes_logs: Obtained Kubernetes Node name to collect logs for (self). self_node_name="ip-10-0-221-130.us-east-2.compute.internal" 2022-06-24T05:32:45.342285Z INFO vector::sources::kubernetes_logs: Excluding matching files. exclude_paths=["/var/log/pods/openshift-logging_collector-*/*/*.log", "/var/log/pods/openshift-logging_elasticsearch-*/*/*.log", "/var/log/pods/openshift-logging_kibana-*/*/*.log"] 2022-06-24T05:32:45.366877Z INFO vector::topology::running: Running healthchecks. 2022-06-24T05:32:45.367042Z INFO vector: Vector has started. debug="false" version="0.21.0" arch="x86_64" build_id="none" 2022-06-24T05:32:45.367432Z INFO vector::topology::builder: Healthcheck: Passed. 2022-06-24T05:32:45.368453Z INFO source{component_kind="source" component_id=raw_journal_logs component_type=journald component_name=raw_journal_logs}: vector::sources::journald: Starting journalctl. 2022-06-24T05:32:45.368892Z INFO source{component_kind="source" component_id=raw_container_logs component_type=kubernetes_logs component_name=raw_container_logs}:file_server: file_source::checkpointer: Loaded checkpoint data. 2022-06-24T05:32:50.442544Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=A non-successful status returned: 404 Not Found component_kind="sink" component_type="loki" component_id=loki_infra component_name=loki_infra $ oc rsh collector-lswpd Defaulted container "collector" out of: collector, logfilesmetricexporter sh-4.4# vector validate /etc/vector/vector.toml Loaded with warnings ["/etc/vector/vector.toml"] ------------------------------------------------ ~ Transform "route_container_logs._unmatched" has no consumers2022-06-24T05:34:32.329364Z INFO vector::sources::kubernetes_logs: Obtained Kubernetes Node name to collect logs for (self). self_node_name="ip-10-0-223-91.us-east-2.compute.internal" 2022-06-24T05:34:32.337961Z INFO vector::sources::kubernetes_logs: Excluding matching files. exclude_paths=["/var/log/pods/openshift-logging_collector-*/*/*.log", "/var/log/pods/openshift-logging_elasticsearch-*/*/*.log", "/var/log/pods/openshift-logging_kibana-*/*/*.log"] √ Component configuration 2022-06-24T05:34:32.379154Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=A non-successful status returned: 404 Not Found component_kind="sink" component_type="loki" component_id=loki_infra component_name=loki_infra x Health check for "loki_infra" failed 2022-06-24T05:34:32.379248Z INFO vector::topology::builder: Healthcheck: Passed. √ Health check "prometheus_output" sh-4.4#
8 Check the infra logs in Lokistack.
bearer_token=$(oc sa get-token logcollector) lokistack_route=$(oc get route lokistack-instance -n openshift-logging -o json |jq '.spec.host' -r) logcli -o raw --tls-skip-verify --bearer-token="${bearer_token}" --addr="https://${lokistack_route}/api/logs/v1/infrastructure" query '{log_type="infrastructure"}' $ logcli -o raw --tls-skip-verify --bearer-token="${bearer_token}" --addr="https://${lokistack_route}/api/logs/v1/infrastructure" query '{log_type="infrastructure"}' 2022-06-24 11:31:58.547070 I | proto: duplicate proto type registered: purgeplan.DeletePlan 2022-06-24 11:31:58.547108 I | proto: duplicate proto type registered: purgeplan.ChunksGroup 2022-06-24 11:31:58.547115 I | proto: duplicate proto type registered: purgeplan.ChunkDetails 2022-06-24 11:31:58.547120 I | proto: duplicate proto type registered: purgeplan.Interval 2022-06-24 11:31:58.554724 I | proto: duplicate proto type registered: grpc.PutChunksRequest 2022-06-24 11:31:58.554739 I | proto: duplicate proto type registered: grpc.GetChunksRequest 2022-06-24 11:31:58.554744 I | proto: duplicate proto type registered: grpc.GetChunksResponse 2022-06-24 11:31:58.554747 I | proto: duplicate proto type registered: grpc.Chunk 2022-06-24 11:31:58.554750 I | proto: duplicate proto type registered: grpc.ChunkID 2022-06-24 11:31:58.554753 I | proto: duplicate proto type registered: grpc.DeleteTableRequest 2022-06-24 11:31:58.554756 I | proto: duplicate proto type registered: grpc.DescribeTableRequest 2022-06-24 11:31:58.554759 I | proto: duplicate proto type registered: grpc.WriteBatch 2022-06-24 11:31:58.554762 I | proto: duplicate proto type registered: grpc.WriteIndexRequest 2022-06-24 11:31:58.554765 I | proto: duplicate proto type registered: grpc.DeleteIndexRequest 2022-06-24 11:31:58.554768 I | proto: duplicate proto type registered: grpc.QueryIndexResponse 2022-06-24 11:31:58.554772 I | proto: duplicate proto type registered: grpc.Row 2022-06-24 11:31:58.554775 I | proto: duplicate proto type registered: grpc.IndexEntry 2022-06-24 11:31:58.554778 I | proto: duplicate proto type registered: grpc.QueryIndexRequest 2022-06-24 11:31:58.554781 I | proto: duplicate proto type registered: grpc.UpdateTableRequest 2022-06-24 11:31:58.554784 I | proto: duplicate proto type registered: grpc.DescribeTableResponse 2022-06-24 11:31:58.554787 I | proto: duplicate proto type registered: grpc.CreateTableRequest 2022-06-24 11:31:58.554790 I | proto: duplicate proto type registered: grpc.TableDesc 2022-06-24 11:31:58.554799 I | proto: duplicate proto type registered: grpc.TableDesc.TagsEntry 2022-06-24 11:31:58.554803 I | proto: duplicate proto type registered: grpc.ListTablesResponse 2022-06-24 11:31:58.554805 I | proto: duplicate proto type registered: grpc.Labels 2022-06-24 11:31:58.554891 I | proto: duplicate proto type registered: storage.Entry 2022-06-24 11:31:58.554897 I | proto: duplicate proto type registered: storage.ReadBatch https://lokistack-instance-openshift-logging.apps.ikanse-15.qe.devcluster.openshift.com/api/logs/v1/infrastructure/loki/api/v1/query_range?direction=BACKWARD&end=1656050518555614220&limit=30&query=%7Blog_type%3D%22infrastructure%22%7D&start=1656046918555614220 Common labels: {log_type="infrastructure"} {"@timestamp":"2022-06-24T06:01:58.429478394Z","file":"/var/log/pods/openshift-cluster-version_cluster-version-operator-8586c8d446-fpxtq_78ccbda6-13a8-4bd3-9413-a1d4feda1d5f/cluster-version-operator/0.log","hostname":"ip-10-0-145-69.us-east-2.compute.internal","kubernetes":{"annotations":{"openshift.io/scc":"hostaccess"},"container_id":"cri-o://fd73556c94ea5afda8f4078e4600e57db389a1a3c30857bd033be249a0185c7f","container_image":"registry.ci.openshift.org/ocp/release@sha256:6bb01826e3996b4b792c0eed75316cfd55fd45f87fdd08a54d4953311c6ae985","container_name":"cluster-version-operator","labels":{"k8s-app":"cluster-version-operator","pod-template-hash":"8586c8d446"},"namespace_labels":{"kubernetes.io/metadata.name":"openshift-cluster-version","name":"openshift-cluster-version","olm.operatorgroup.uid/37f23c84-2f70-4a52-8536-b980109adab0":"","openshift.io/cluster-monitoring":"true","openshift.io/run-level":"","pod-security.kubernetes.io/audit":"privileged","pod-security.kubernetes.io/enforce":"privileged","pod-security.kubernetes.io/warn":"privileged"},"namespace_name":"openshift-cluster-version","pod_id":"78ccbda6-13a8-4bd3-9413-a1d4feda1d5f","pod_ip":"10.0.145.69","pod_name":"cluster-version-operator-8586c8d446-fpxtq","pod_owner":"ReplicaSet/cluster-version-operator-8586c8d446"},"level":"info","log_type":"infrastructure","message":"I0624 06:01:58.429468 1 sync_worker.go:832] Running sync for securitycontextconstraints \"anyuid\" (84 of 771)"} {"@timestamp":"2022-06-24T06:01:58.429478394Z","file":"/var/log/pods/openshift-cluster-version_cluster-version-operator-8586c8d446-fpxtq_78ccbda6-13a8-4bd3-9413-a1d4feda1d5f/cluster-version-operator/0.log","hostname":"ip-10-0-145-69.us-east-2.compute.internal","kubernetes":{"annotations":{"openshift.io/scc":"hostaccess"},"container_id":"cri-o://fd73556c94ea5afda8f4078e4600e57db389a1a3c30857bd033be249a0185c7f","container_image":"registry.ci.openshift.org/ocp/release@sha256:6bb01826e3996b4b792c0eed75316cfd55fd45f87fdd08a54d4953311c6ae985","container_name":"cluster-version-operator","labels":{"k8s-app":"cluster-version-operator","pod-template-hash":"8586c8d446"},"namespace_labels":{"kubernetes.io/metadata.name":"openshift-cluster-version","name":"openshift-cluster-version","olm.operatorgroup.uid/37f23c84-2f70-4a52-8536-b980109adab0":"","openshift.io/cluster-monitoring":"true","openshift.io/run-level":"","pod-security.kubernetes.io/audit":"privileged","pod-security.kubernetes.io/enforce":"privileged","pod-security.kubernetes.io/warn":"privileged"},"namespace_name":"openshift-cluster-version","pod_id":"78ccbda6-13a8-4bd3-9413-a1d4feda1d5f","pod_ip":"10.0.145.69","pod_name":"cluster-version-operator-8586c8d446-fpxtq","pod_owner":"ReplicaSet/cluster-version-operator-8586c8d446"},"level":"info","log_type":"infrastructure","message":"I0624 06:01:58.429438 1 sync_worker.go:844] Done syncing for namespace \"openshift-kube-apiserver-operator\" (83 of 771)"}