-
Bug
-
Resolution: Done
-
Blocker
-
Logging 5.4.0
-
False
-
None
-
False
-
NEW
-
OBSDA-108 - Distribute an alternate Vector Log Collector
-
VERIFIED
-
Logging (Core) - Sprint 216
Description of problem:
When Cluster Logging instance is created with Vector as collector. The collector pods fail to start with configuration error.
$ oc logs collector-rgqvr -c collector Mar 16 04:09:16.315 INFO vector::app: Log level is enabled. level="info"Mar 16 04:09:16.315 INFO vector::app: Loading configs. path=[("/etc/vector/vector.toml", Some(Toml))]Mar 16 04:09:16.317 ERROR vector::cli: Configuration error. error=unknown variant `internal_metrics`, expected one of `file`, `journald`, `kubernetes_logs`, `prometheus`, `prometheus_remote_write`, `prometheus_scrape` for key `sources.internal_metrics` at line 175 column 1
Version-Release number of selected component (if applicable):
Cluster-logging.5.4.0-93
OCP Server Version: 4.10.0-0.nightly-2022-03-15-182807
How reproducible:
Always
Steps to reproduce the issue:
1 Install the Cluster Logging and Elasticsearch 5.4 operators.
2 Create a ClusterLogging instance with Vector preview.
apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" annotations: logging.openshift.io/preview-vector-collector: "enabled" spec: managementState: "Managed" logStore: type: "elasticsearch" retentionPolicy: application: maxAge: 10h infra: maxAge: 10h audit: maxAge: 10h elasticsearch: nodeCount: 1 storage: {} resources: limits: memory: "4Gi" requests: memory: "1Gi" proxy: resources: limits: memory: 256Mi requests: memory: 256Mi redundancyPolicy: "ZeroRedundancy" visualization: type: "kibana" kibana: replicas: 1 collection: logs: type: "vector"
3 Check the status and logs of collector pods.
oc get pods NAME READY STATUS RESTARTS AGE cluster-logging-operator-6884bc7f49-nv92h 1/1 Running 0 57m collector-ggpwl 1/2 CrashLoopBackOff 1 (9s ago) 13s collector-gmwhz 1/2 CrashLoopBackOff 1 (9s ago) 14s collector-pxjxm 1/2 CrashLoopBackOff 1 (10s ago) 13s collector-q4c2t 1/2 CrashLoopBackOff 1 (10s ago) 14s collector-r7nfh 1/2 CrashLoopBackOff 1 (7s ago) 12s collector-zvktw 1/2 CrashLoopBackOff 1 (11s ago) 14s elasticsearch-cdm-9kkyx6ta-1-678f5686f4-wfmmx 1/2 Running 0 17s kibana-6d757f6c4-dcnxc 2/2 Running 0 13s oc logs collector-pxjxm -c collector Mar 16 04:59:46.453 INFO vector::app: Log level is enabled. level="info" Mar 16 04:59:46.454 INFO vector::app: Loading configs. path=[("/etc/vector/vector.toml", Some(Toml))] Mar 16 04:59:46.456 ERROR vector::cli: Configuration error. error=unknown variant `internal_metrics`, expected one of `file`, `journald`, `kubernetes_logs`, `prometheus`, `prometheus_remote_write`, `prometheus_scrape` for key `sources.internal_metrics` at line 175 column 1
4 Below is the generated Vector config.
# Logs from containers (including openshift containers) [sources.raw_container_logs] type = "kubernetes_logs" auto_partial_merge = true exclude_paths_glob_patterns = ["/var/log/pods/openshift-logging_collector-*/*/*.log", "/var/log/pods/openshift-logging_elasticsearch-*/*/*.log", "/var/log/pods/openshift-logging_kibana-*/*/*.log"][sources.raw_journal_logs] type = "journald"[sources.internal_metrics] type = "internal_metrics"[transforms.container_logs] type = "remap" inputs = ["raw_container_logs"] source = ''' level = "unknown" if match(.message,r'(Warning|WARN|W[0-9]+|level=warn|Value:warn|"level":"warn")'){ level = "warn" } else if match(.message, r'Info|INFO|I[0-9]+|level=info|Value:info|"level":"info"'){ level = "info" } else if match(.message, r'Error|ERROR|E[0-9]+|level=error|Value:error|"level":"error"'){ level = "error" } else if match(.message, r'Debug|DEBUG|D[0-9]+|level=debug|Value:debug|"level":"debug"'){ level = "debug" } .level = level .pipeline_metadata.collector.name = "vector" .pipeline_metadata.collector.version = "0.14.1" ip4, err = get_env_var("NODE_IPV4") .pipeline_metadata.collector.ipaddr4 = ip4 received, err = format_timestamp(now(),"%+") .pipeline_metadata.collector.received_at = received .pipeline_metadata.collector.error = err '''[transforms.journal_logs] type = "remap" inputs = ["raw_journal_logs"] source = ''' level = "unknown" if match(.message,r'(Warning|WARN|W[0-9]+|level=warn|Value:warn|"level":"warn")'){ level = "warn" } else if match(.message, r'Info|INFO|I[0-9]+|level=info|Value:info|"level":"info"'){ level = "info" } else if match(.message, r'Error|ERROR|E[0-9]+|level=error|Value:error|"level":"error"'){ level = "error" } else if match(.message, r'Debug|DEBUG|D[0-9]+|level=debug|Value:debug|"level":"debug"'){ level = "debug" } .level = level .pipeline_metadata.collector.name = "vector" .pipeline_metadata.collector.version = "0.14.1" ip4, err = get_env_var("NODE_IPV4") .pipeline_metadata.collector.ipaddr4 = ip4 received, err = format_timestamp(now(),"%+") .pipeline_metadata.collector.received_at = received .pipeline_metadata.collector.error = err ''' [transforms.route_container_logs] type = "route" inputs = ["container_logs"] route.app = '!((starts_with!(.kubernetes.pod_namespace,"kube")) || (starts_with!(.kubernetes.pod_namespace,"openshift")) || (.kubernetes.pod_namespace == "default"))' route.infra = '(starts_with!(.kubernetes.pod_namespace,"kube")) || (starts_with!(.kubernetes.pod_namespace,"openshift")) || (.kubernetes.pod_namespace == "default")' # Rename log stream to "application" [transforms.application] type = "remap" inputs = ["route_container_logs.app"] source = """ .log_type = "application" """ # Rename log stream to "infrastructure" [transforms.infrastructure] type = "remap" inputs = ["route_container_logs.infra","journal_logs"] source = """ .log_type = "infrastructure" """ [transforms.pipeline_0_] type = "remap" inputs = ["application","infrastructure"] source = """ . """ # Adding _id field [transforms.default_add_es_id] type = "remap" inputs = ["pipeline_0_"] source = """ index = "default" if (.log_type == "application"){ index = "app" } if (.log_type == "infrastructure"){ index = "infra" } if (.log_type == "audit"){ index = "audit" } ."write-index"=index+"-write" ._id = encode_base64(uuid_v4()) """[transforms.default_dedot_and_flatten] type = "lua" inputs = ["default_add_es_id"] version = "2" hooks.process = "process" source = """ function process(event, emit) if event.log.kubernetes == nil then return end dedot(event.log.kubernetes.pod_labels) -- create "flat_labels" key event.log.kubernetes.flat_labels = {} i = 1 -- flatten the labels for k,v in pairs(event.log.kubernetes.pod_labels) do event.log.kubernetes.flat_labels[i] = k.."="..v i=i+1 end -- delete the "pod_labels" key event.log.kubernetes["pod_labels"] = nil emit(event) end function dedot(map) if map == nil then return end local new_map = {} local changed_keys = {} for k, v in pairs(map) do local dedotted = string.gsub(k, "%.", "_") if dedotted ~= k then new_map[dedotted] = v changed_keys[k] = true end end for k in pairs(changed_keys) do map[k] = nil end for k, v in pairs(new_map) do map[k] = v end end """[sinks.default] type = "elasticsearch" inputs = ["default_dedot_and_flatten"] endpoint = "https://elasticsearch.openshift-logging.svc:9200" index = "{{ write-index }}" request.timeout_secs = 2147483648 bulk_action = "create" id_key = "_id" # TLS Config [sinks.default.tls] key_file = "/var/run/ocp-collector/secrets/collector/tls.key" crt_file = "/var/run/ocp-collector/secrets/collector/tls.crt" ca_file = "/var/run/ocp-collector/secrets/collector/ca-bundle.crt" [sinks.prometheus_output] type = "prometheus_exporter" inputs = ["internal_metrics"] address = "0.0.0.0:24231" default_namespace = "collector"
5 Collector image used.
collector:
Container ID: cri-o://180bb56fb4972922996748a833fec3e456098678062738e9c9e02862ca85612e
Image: registry.redhat.io/openshift-logging/vector-rhel8@sha256:dda974f1ac9dd666191a2c4724180f8a672ebf2947a37b493b7afe5d4e05768b
Image ID: registry.redhat.io/openshift-logging/vector-rhel8@sha256:c7368de78d829815e3fc24a35f809c928b41d755035e35fff867b29d9087a32b
6 Vector version:
sh-4.4# vector --version vector 0.14.1 (x86_64-unknown-linux-gnu)