Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-2763

[Vector]{Master} Vector's healthcheck fails when forwarding logs to Lokistack.

XMLWordPrintable

    • False
    • None
    • False
    • NEW
    • VERIFIED
    • Log Collection - Sprint 221, Log Collection - Sprint 222

      Version of the components:

      $ oc get csv
      NAME                            DISPLAY                            VERSION   REPLACES   PHASE
      clusterlogging.v5.5.0           Red Hat OpenShift Logging          5.5.0                Succeeded
      elasticsearch-operator.v5.5.0   OpenShift Elasticsearch Operator   5.5.0                Succeeded
      loki-operator.v0.0.1            Loki Operator                      0.0.1                Succeeded 
      
      Server Version: 4.10.0-0.nightly-2022-06-08-150219
      Kubernetes Version: v1.23.5+3afdacb

      Description of the problem:

      When forwarding logs to a Lokistack instance, the Vector's healthcheck is failing but the logs are being sent to the Lokistack.

      $ oc rsh collector-lswpd
      Defaulted container "collector" out of: collector, logfilesmetricexporter
      sh-4.4# vector validate /etc/vector/vector.toml 
      Loaded with warnings ["/etc/vector/vector.toml"]
      ------------------------------------------------
      ~ Transform "route_container_logs._unmatched" has no consumers2022-06-24T05:34:32.329364Z  INFO vector::sources::kubernetes_logs: Obtained Kubernetes Node name to collect logs for (self). self_node_name="ip-10-0-223-91.us-east-2.compute.internal"
      2022-06-24T05:34:32.337961Z  INFO vector::sources::kubernetes_logs: Excluding matching files. exclude_paths=["/var/log/pods/openshift-logging_collector-*/*/*.log", "/var/log/pods/openshift-logging_elasticsearch-*/*/*.log", "/var/log/pods/openshift-logging_kibana-*/*/*.log"]
      √ Component configuration
      2022-06-24T05:34:32.379154Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=A non-successful status returned: 404 Not Found component_kind="sink" component_type="loki" component_id=loki_infra component_name=loki_infra
      x Health check for "loki_infra" failed
      2022-06-24T05:34:32.379248Z  INFO vector::topology::builder: Healthcheck: Passed.
      √ Health check "prometheus_output"
      sh-4.4# 

      Steps to reproduce the issue:

      1 Install the Logging, Loki and Elasticsearch operators build from the master branch and create a Lokistack instance.

      2 Create secret with token and CA bundle requried for connecting to the Lokistack instance. Token kept empty due to bug https://issues.redhat.com/browse/LOG-2461

      oc -n openshift-logging create secret generic lokistack-gateway-bearer-token   --from-literal=token=""  --from-literal=ca-bundle.crt="$(oc -n openshift-logging get cm lokistack-instance-ca-bundle -o json | jq -r '.data."service-ca.crt"')" 

      3 Create the ClusterRole and ClusterRoleBinding required for the logcollector SA.

      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: lokistack-instance-tenant-logs
      rules:
      - apiGroups:
        - 'loki.grafana.com'
        resources:
        - application
        - infrastructure
        - audit
        resourceNames:
        - logs
        verbs:
        - 'get'
        - 'create'
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRoleBinding
      metadata:
        name: lokistack-instance-tenant-logs
      roleRef:
        apiGroup: rbac.authorization.k8s.io
        kind: ClusterRole
        name: lokistack-instance-tenant-logs
      subjects:
      - kind: ServiceAccount
        name: logcollector
        namespace: openshift-logging 

      4 Create CLF instance.

      apiVersion: logging.openshift.io/v1
      kind: ClusterLogForwarder
      metadata:
        name: instance
        namespace: openshift-logging
      spec:
        outputs:
         - name: loki-infra
           type: loki
           url: https://lokistack-instance-gateway-http.openshift-logging.svc:8080/api/logs/v1/infrastructure/
           secret:
             name: lokistack-gateway-bearer-token
        pipelines:
         - name: send-infra-logs
           inputRefs:
           - infrastructure
           outputRefs:
           - loki-infra
       

      5 Create ClusterLogging instance.

      apiVersion: "logging.openshift.io/v1"
      kind: "ClusterLogging"
      metadata:
        name: "instance" 
        namespace: "openshift-logging"
      spec:
        managementState: "Managed"  
        collection:
          logs:
            type: "vector"  
            vector: {} 

      6  Add token to the Vector config.

      Set ClusterLogging instance to Unmanaged.
      oc edit clusterloggings.logging.openshift.io instance
      
      oc extract secret/collector-config --confirm
      oc sa get-token logcollector
      vi vector.toml
      
      [sinks.loki_infra]
      type = "loki"
      inputs = ["send-infra-logs"]
      endpoint = "https://lokistack-instance-gateway-http.openshift-logging.svc:8080/api/logs/v1/infrastructure/"
      
      [sinks.loki_infra.encoding]
      codec = "json"
      
      [sinks.loki_infra.labels]
      kubernetes_container_name = "{{kubernetes.container_name}}"
      kubernetes_host = "${VECTOR_SELF_NODE_NAME}"
      kubernetes_namespace_name = "{{kubernetes.namespace_name}}"
      kubernetes_pod_name = "{{kubernetes.pod_name}}"
      log_type = "{{log_type}}"# TLS Config
      
      [sinks.loki_infra.tls]
      enabled = true
      ca_file = "/var/run/ocp-collector/secrets/lokistack-gateway-bearer-token/ca-bundle.crt"
      
      [sinks.loki_infra.auth]
      strategy = "bearer"
      token = "eyJhbGciOiJSUzI1NiIsImtpZCI6Im1ucE9EQXlkSUgwVU9CQ2tTVUhRcWRwdm9LbzYxTlZqalVBMWFWUEVrbU0ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbG9nZ2luZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJsb2djb2xsZWN0b3ItdG9rZW4tYnRsbDUiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibG9nY29sbGVjdG9yIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNjBjZGZmZTktZWZlZC00MzQxLWIzZTctYzIwMDllYThjZmQwIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1sb2dnaW5nOmxvZ2NvbGxlY3RvciJ9.ZWvWFZVxNkEFGzKyyNCTMm1-0DDDS6edKnHyfW0zyNSMKgg7NeSGlgcxGnLCPvRmuhWw2u6bzxeAyrwaRF5kRJI0J-OyR6-WLQKU9hbeSXSDowmOwiu1wxwMgFnCsr85bGjmyVoP0niBk4sxsrGLDPL2vEDuLplMppasfKmQlX1CrEElbMCW8GYKRjB1DgedUVAQUTqgaMDmBBDzedtm_BRTjKr8-tKhv_CB4K3oHH9cIhbiNtC1Nrvc3uAKYQc37UKqJjVnKL6R0_0Ms1IwksLD4s4FEeTfZiuWXnqTyoLXUvGOpfJvGFmZgTZXUgB4HcW4Zwj8eyDDtANrsQo7_AajizwVC1ty1HYFW7gkMzaSQXkmtNBUFptT4vwoM2sR2RItH1Y4GWn92qh9NEdXKBSu4ZewyXjU2b0ckR6dS-Y7B5Gzlld-ciQdaX_9_aE0LV9wU3ppG8VHY3fY2mV93h0mEf_qrMkf6fSzB_jvLaDaxHeI8IEjV_Six6XzAwR65Uv3HRfP9ghcLNPQsskmyvXObUDy-LIQ9AHTyPxVolOXoHoTY5PY0yoV3rmSHYCz_PbzWs67A0M98wsm0eahaSFeMD8tiDTMeKoC4uZn5SeFk6PXBb0M_7tYGyMgdEH-zVN7cGJASRZqHL-PtaJC-BgyszckSUOxgD1QVqoMzHA"
      
      oc set data secret/collector-config --from-file vector.toml
      oc delete pods --selector=component=collector
      

      7 Check the collector logs and validate the Vector config.

      $ oc logs collector-psz7t
      Defaulted container "collector" out of: collector, logfilesmetricexporter
      2022-06-24T05:32:45.320903Z  INFO vector::app: Log level is enabled. level="info"
      2022-06-24T05:32:45.321123Z  INFO vector::app: Loading configs. paths=["/etc/vector/vector.toml"]
      2022-06-24T05:32:45.333953Z  WARN vector::config::loading: Transform "route_container_logs._unmatched" has no consumers
      2022-06-24T05:32:45.334146Z  INFO vector::sources::kubernetes_logs: Obtained Kubernetes Node name to collect logs for (self). self_node_name="ip-10-0-221-130.us-east-2.compute.internal"
      2022-06-24T05:32:45.342285Z  INFO vector::sources::kubernetes_logs: Excluding matching files. exclude_paths=["/var/log/pods/openshift-logging_collector-*/*/*.log", "/var/log/pods/openshift-logging_elasticsearch-*/*/*.log", "/var/log/pods/openshift-logging_kibana-*/*/*.log"]
      2022-06-24T05:32:45.366877Z  INFO vector::topology::running: Running healthchecks.
      2022-06-24T05:32:45.367042Z  INFO vector: Vector has started. debug="false" version="0.21.0" arch="x86_64" build_id="none"
      2022-06-24T05:32:45.367432Z  INFO vector::topology::builder: Healthcheck: Passed.
      2022-06-24T05:32:45.368453Z  INFO source{component_kind="source" component_id=raw_journal_logs component_type=journald component_name=raw_journal_logs}: vector::sources::journald: Starting journalctl.
      2022-06-24T05:32:45.368892Z  INFO source{component_kind="source" component_id=raw_container_logs component_type=kubernetes_logs component_name=raw_container_logs}:file_server: file_source::checkpointer: Loaded checkpoint data.
      2022-06-24T05:32:50.442544Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=A non-successful status returned: 404 Not Found component_kind="sink" component_type="loki" component_id=loki_infra component_name=loki_infra
      
      
      $ oc rsh collector-lswpd
      Defaulted container "collector" out of: collector, logfilesmetricexporter
      sh-4.4# vector validate /etc/vector/vector.toml 
      Loaded with warnings ["/etc/vector/vector.toml"]
      ------------------------------------------------
      ~ Transform "route_container_logs._unmatched" has no consumers2022-06-24T05:34:32.329364Z  INFO vector::sources::kubernetes_logs: Obtained Kubernetes Node name to collect logs for (self). self_node_name="ip-10-0-223-91.us-east-2.compute.internal"
      2022-06-24T05:34:32.337961Z  INFO vector::sources::kubernetes_logs: Excluding matching files. exclude_paths=["/var/log/pods/openshift-logging_collector-*/*/*.log", "/var/log/pods/openshift-logging_elasticsearch-*/*/*.log", "/var/log/pods/openshift-logging_kibana-*/*/*.log"]
      √ Component configuration
      2022-06-24T05:34:32.379154Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=A non-successful status returned: 404 Not Found component_kind="sink" component_type="loki" component_id=loki_infra component_name=loki_infra
      x Health check for "loki_infra" failed
      2022-06-24T05:34:32.379248Z  INFO vector::topology::builder: Healthcheck: Passed.
      √ Health check "prometheus_output"
      sh-4.4# 

      8 Check the infra logs in Lokistack.

      bearer_token=$(oc sa get-token logcollector)
      
      lokistack_route=$(oc get route lokistack-instance -n openshift-logging -o json |jq '.spec.host' -r)
      
      logcli -o raw --tls-skip-verify --bearer-token="${bearer_token}" --addr="https://${lokistack_route}/api/logs/v1/infrastructure" query '{log_type="infrastructure"}'
      
      $ logcli -o raw --tls-skip-verify --bearer-token="${bearer_token}" --addr="https://${lokistack_route}/api/logs/v1/infrastructure" query '{log_type="infrastructure"}'
      2022-06-24 11:31:58.547070 I | proto: duplicate proto type registered: purgeplan.DeletePlan
      2022-06-24 11:31:58.547108 I | proto: duplicate proto type registered: purgeplan.ChunksGroup
      2022-06-24 11:31:58.547115 I | proto: duplicate proto type registered: purgeplan.ChunkDetails
      2022-06-24 11:31:58.547120 I | proto: duplicate proto type registered: purgeplan.Interval
      2022-06-24 11:31:58.554724 I | proto: duplicate proto type registered: grpc.PutChunksRequest
      2022-06-24 11:31:58.554739 I | proto: duplicate proto type registered: grpc.GetChunksRequest
      2022-06-24 11:31:58.554744 I | proto: duplicate proto type registered: grpc.GetChunksResponse
      2022-06-24 11:31:58.554747 I | proto: duplicate proto type registered: grpc.Chunk
      2022-06-24 11:31:58.554750 I | proto: duplicate proto type registered: grpc.ChunkID
      2022-06-24 11:31:58.554753 I | proto: duplicate proto type registered: grpc.DeleteTableRequest
      2022-06-24 11:31:58.554756 I | proto: duplicate proto type registered: grpc.DescribeTableRequest
      2022-06-24 11:31:58.554759 I | proto: duplicate proto type registered: grpc.WriteBatch
      2022-06-24 11:31:58.554762 I | proto: duplicate proto type registered: grpc.WriteIndexRequest
      2022-06-24 11:31:58.554765 I | proto: duplicate proto type registered: grpc.DeleteIndexRequest
      2022-06-24 11:31:58.554768 I | proto: duplicate proto type registered: grpc.QueryIndexResponse
      2022-06-24 11:31:58.554772 I | proto: duplicate proto type registered: grpc.Row
      2022-06-24 11:31:58.554775 I | proto: duplicate proto type registered: grpc.IndexEntry
      2022-06-24 11:31:58.554778 I | proto: duplicate proto type registered: grpc.QueryIndexRequest
      2022-06-24 11:31:58.554781 I | proto: duplicate proto type registered: grpc.UpdateTableRequest
      2022-06-24 11:31:58.554784 I | proto: duplicate proto type registered: grpc.DescribeTableResponse
      2022-06-24 11:31:58.554787 I | proto: duplicate proto type registered: grpc.CreateTableRequest
      2022-06-24 11:31:58.554790 I | proto: duplicate proto type registered: grpc.TableDesc
      2022-06-24 11:31:58.554799 I | proto: duplicate proto type registered: grpc.TableDesc.TagsEntry
      2022-06-24 11:31:58.554803 I | proto: duplicate proto type registered: grpc.ListTablesResponse
      2022-06-24 11:31:58.554805 I | proto: duplicate proto type registered: grpc.Labels
      2022-06-24 11:31:58.554891 I | proto: duplicate proto type registered: storage.Entry
      2022-06-24 11:31:58.554897 I | proto: duplicate proto type registered: storage.ReadBatch
      https://lokistack-instance-openshift-logging.apps.ikanse-15.qe.devcluster.openshift.com/api/logs/v1/infrastructure/loki/api/v1/query_range?direction=BACKWARD&end=1656050518555614220&limit=30&query=%7Blog_type%3D%22infrastructure%22%7D&start=1656046918555614220
      Common labels: {log_type="infrastructure"}
      {"@timestamp":"2022-06-24T06:01:58.429478394Z","file":"/var/log/pods/openshift-cluster-version_cluster-version-operator-8586c8d446-fpxtq_78ccbda6-13a8-4bd3-9413-a1d4feda1d5f/cluster-version-operator/0.log","hostname":"ip-10-0-145-69.us-east-2.compute.internal","kubernetes":{"annotations":{"openshift.io/scc":"hostaccess"},"container_id":"cri-o://fd73556c94ea5afda8f4078e4600e57db389a1a3c30857bd033be249a0185c7f","container_image":"registry.ci.openshift.org/ocp/release@sha256:6bb01826e3996b4b792c0eed75316cfd55fd45f87fdd08a54d4953311c6ae985","container_name":"cluster-version-operator","labels":{"k8s-app":"cluster-version-operator","pod-template-hash":"8586c8d446"},"namespace_labels":{"kubernetes.io/metadata.name":"openshift-cluster-version","name":"openshift-cluster-version","olm.operatorgroup.uid/37f23c84-2f70-4a52-8536-b980109adab0":"","openshift.io/cluster-monitoring":"true","openshift.io/run-level":"","pod-security.kubernetes.io/audit":"privileged","pod-security.kubernetes.io/enforce":"privileged","pod-security.kubernetes.io/warn":"privileged"},"namespace_name":"openshift-cluster-version","pod_id":"78ccbda6-13a8-4bd3-9413-a1d4feda1d5f","pod_ip":"10.0.145.69","pod_name":"cluster-version-operator-8586c8d446-fpxtq","pod_owner":"ReplicaSet/cluster-version-operator-8586c8d446"},"level":"info","log_type":"infrastructure","message":"I0624 06:01:58.429468       1 sync_worker.go:832] Running sync for securitycontextconstraints \"anyuid\" (84 of 771)"}
      {"@timestamp":"2022-06-24T06:01:58.429478394Z","file":"/var/log/pods/openshift-cluster-version_cluster-version-operator-8586c8d446-fpxtq_78ccbda6-13a8-4bd3-9413-a1d4feda1d5f/cluster-version-operator/0.log","hostname":"ip-10-0-145-69.us-east-2.compute.internal","kubernetes":{"annotations":{"openshift.io/scc":"hostaccess"},"container_id":"cri-o://fd73556c94ea5afda8f4078e4600e57db389a1a3c30857bd033be249a0185c7f","container_image":"registry.ci.openshift.org/ocp/release@sha256:6bb01826e3996b4b792c0eed75316cfd55fd45f87fdd08a54d4953311c6ae985","container_name":"cluster-version-operator","labels":{"k8s-app":"cluster-version-operator","pod-template-hash":"8586c8d446"},"namespace_labels":{"kubernetes.io/metadata.name":"openshift-cluster-version","name":"openshift-cluster-version","olm.operatorgroup.uid/37f23c84-2f70-4a52-8536-b980109adab0":"","openshift.io/cluster-monitoring":"true","openshift.io/run-level":"","pod-security.kubernetes.io/audit":"privileged","pod-security.kubernetes.io/enforce":"privileged","pod-security.kubernetes.io/warn":"privileged"},"namespace_name":"openshift-cluster-version","pod_id":"78ccbda6-13a8-4bd3-9413-a1d4feda1d5f","pod_ip":"10.0.145.69","pod_name":"cluster-version-operator-8586c8d446-fpxtq","pod_owner":"ReplicaSet/cluster-version-operator-8586c8d446"},"level":"info","log_type":"infrastructure","message":"I0624 06:01:58.429438       1 sync_worker.go:844] Done syncing for namespace \"openshift-kube-apiserver-operator\" (83 of 771)"}
      

              rh-ee-calee Calvin Lee
              rhn-support-ikanse Ishwar Kanse
              Ishwar Kanse Ishwar Kanse
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: