-
Bug
-
Resolution: Done
-
Normal
-
Logging 5.3.3
-
False
-
False
-
NEW
-
VERIFIED
-
-
Logging (Core) - Sprint 214, Logging (Core) - Sprint 215, Logging (Core) - Sprint 216, Logging (Core) - Sprint 218, Logging (Core) - Sprint 219
After using a cluster for few releases cluster logging pods is going into a CrashLoopBackOff.
In this cluster one the collector pod container show:
2022-01-24 08:12:50 +0000 [error]: [systemd-input] failed to read data from plugin storage file path="/var/lib/fluentd/pos/journal_pos.json" error_class=Yajl::ParseError error="lexical error: invalid char in json text.\n ClusterRoleBinding \"openshift-s\n (right here) ------^\n" 2022-01-24 08:12:50 +0000 [error]: config error file="/etc/fluent/fluent.conf" error_class=Fluent::ConfigError error="Unexpected error: failed to read data from plugin storage file: '/var/lib/fluentd/pos/journal_pos.json'"
OCP Version:
4.10.0-0.nightly-ppc64le-2022-01-05-135338
CSV and Pods Details:
# oc get csv
NAME DISPLAY VERSION REPLACES PHASE
cluster-logging.5.3.3-2 Red Hat OpenShift Logging 5.3.3-2 Succeeded
elasticsearch-operator.5.3.3-2 OpenShift Elasticsearch Operator 5.3.3-2 Succeeded
# oc get pods
NAME READY STATUS RESTARTS AGE
cluster-logging-operator-5f7644c69d-9m6d6 1/1 Running 0 4d
collector-jdgxn 2/2 Running 0 4d
collector-ncrsf 1/2 CrashLoopBackOff 27 (78s ago) 116m
collector-qm28g 2/2 Running 0 4d
collector-tsf4m 2/2 Running 0 4d
collector-xv2cf 2/2 Running 0 4d
elasticsearch-cdm-jl6h446h-1-64cbfdc7fb-bwv96 2/2 Running 0 4d
elasticsearch-cdm-jl6h446h-2-5fc86d4f95-ph62l 2/2 Running 0 4d
elasticsearch-cdm-jl6h446h-3-cd7f75df4-xlzgc 2/2 Running 0 4d
elasticsearch-im-app-27383535--1-86j2q 0/1 Completed 0 4m25s
elasticsearch-im-audit-27383535--1-pt8c2 0/1 Completed 0 4m25s
elasticsearch-im-infra-27383535--1-n29rr 0/1 Completed 0 4m25s
kibana-7df85cf878-t4vnb 2/2 Running 0 4d
Failing pods description:
# oc describe pods collector-ncrsf Name: collector-ncrsf Namespace: openshift-logging Priority: 1000000 Priority Class Name: cluster-logging Node: master-1/9.47.88.202 Start Time: Mon, 24 Jan 2022 01:23:24 -0500 Labels: component=collector controller-revision-hash=85cbff8f76 logging-infra=collector pod-template-generation=2 provider=openshift Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.130.0.58" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.130.0.58" ], "default": true, "dns": {} }] logging.openshift.io/hash: 3f97714d556aad5b80fee5e2eaa3e16b openshift.io/scc: log-collector-scc scheduler.alpha.kubernetes.io/critical-pod: Status: Running IP: 10.130.0.58 IPs: IP: 10.130.0.58 Controlled By: DaemonSet/collector Containers: collector: Container ID: cri-o://0425ab3f0275b815246ac6f509a8588c9c50d3603b51f24b313c590641df1e27 Image: registry.redhat.io/openshift-logging/fluentd-rhel8@sha256:b44b9b45e36e350a0b208745d3ce77e9619c1f1501d40eaab4590ddd0e43fdb2 Image ID: registry.redhat.io/openshift-logging/fluentd-rhel8@sha256:6f9db5919505d19f846f497572480ded5301c3af8dcffdadeacf1d3f724bc20a Port: 24231/TCP Host Port: 0/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 2 Started: Mon, 24 Jan 2022 05:58:14 -0500 Finished: Mon, 24 Jan 2022 05:58:18 -0500 Ready: False Restart Count: 58 Limits: memory: 736Mi Requests: cpu: 100m memory: 736Mi Environment: NODE_NAME: (v1:spec.nodeName) METRICS_CERT: /etc/fluent/metrics/tls.crt METRICS_KEY: /etc/fluent/metrics/tls.key NODE_IPV4: (v1:status.hostIP) POD_IP: (v1:status.podIP) HTTPS_PROXY: http://mju-logg-410-bastion-0:3128 HTTP_PROXY: http://mju-logg-410-bastion-0:3128 NO_PROXY: .cluster.local,.mju-logg-410.ibm.com,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,172.30.0.0/16,9.47.80.0/20,api-int.mju-logg-410.ibm.com,localhost COLLECTOR_CONF_HASH: 4f630b5526a99622bf44b6cae6c8a0a7 Mounts: /etc/fluent/configs.d/secure-forward from secureforwardconfig (ro) /etc/fluent/configs.d/syslog from syslogconfig (ro) /etc/fluent/configs.d/user from config (ro) /etc/fluent/keys from certs (ro) /etc/fluent/metrics from collector-metrics (ro) /etc/localtime from localtime (ro) /etc/ocp-forward from secureforwardcerts (ro) /etc/ocp-syslog from syslogcerts (ro) /etc/pki/ca-trust/extracted/pem/ from collector-trusted-ca-bundle (ro) /opt/app-root/src/run.sh from entrypoint (ro,path="run.sh") /tmp from tmp (rw) /var/lib/fluentd from filebufferstorage (rw) /var/log/audit from varlogaudit (ro) /var/log/containers from varlogcontainers (ro) /var/log/journal from varlogjournal (ro) /var/log/kube-apiserver from varlogkubeapiserver (ro) /var/log/oauth-apiserver from varlogoauthapiserver (ro) /var/log/openshift-apiserver from varlogopenshiftapiserver (ro) /var/log/ovn from varlogovn (ro) /var/log/pods from varlogpods (ro) /var/run/ocp-collector/secrets/collector from collector (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mwnbz (ro) logfilesmetricexporter: Container ID: cri-o://63c336d8f4f3d85b018ec17ba80874b86159d7063c60d911b39a00f7d1504211 Image: registry.redhat.io/openshift-logging/log-file-metric-exporter-rhel8@sha256:ee5badc62b5a1066cd2520cb219c74a18da47bd440bef4ebac4939433c7345ff Image ID: registry.redhat.io/openshift-logging/log-file-metric-exporter-rhel8@sha256:98e13663bdd8044b5cb493779e068b21c5cb26f519f12bdd3627cc11141df0a1 Port: 2112/TCP Host Port: 0/TCP Command: /usr/local/bin/log-file-metric-exporter -verbosity=2 -dir=/var/log/containers -http=:2112 -keyFile=/etc/fluent/metrics/tls.key -crtFile=/etc/fluent/metrics/tls.crt State: Running Started: Mon, 24 Jan 2022 01:23:27 -0500 Ready: True Restart Count: 0 Environment: <none> Mounts: /etc/fluent/metrics from collector-metrics (rw) /var/log from varlog (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mwnbz (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: varlog: Type: HostPath (bare host directory volume) Path: /var/log HostPathType: varlogcontainers: Type: HostPath (bare host directory volume) Path: /var/log/containers HostPathType: varlogpods: Type: HostPath (bare host directory volume) Path: /var/log/pods HostPathType: varlogjournal: Type: HostPath (bare host directory volume) Path: /var/log/journal HostPathType: varlogaudit: Type: HostPath (bare host directory volume) Path: /var/log/audit HostPathType: varlogovn: Type: HostPath (bare host directory volume) Path: /var/log/ovn HostPathType: varlogoauthapiserver: Type: HostPath (bare host directory volume) Path: /var/log/oauth-apiserver HostPathType: varlogopenshiftapiserver: Type: HostPath (bare host directory volume) Path: /var/log/openshift-apiserver HostPathType: varlogkubeapiserver: Type: HostPath (bare host directory volume) Path: /var/log/kube-apiserver HostPathType: config: Type: ConfigMap (a volume populated by a ConfigMap) Name: collector Optional: false secureforwardconfig: Type: ConfigMap (a volume populated by a ConfigMap) Name: secure-forward Optional: true secureforwardcerts: Type: Secret (a volume populated by a Secret) SecretName: secure-forward Optional: true syslogconfig: Type: ConfigMap (a volume populated by a ConfigMap) Name: syslog Optional: true syslogcerts: Type: Secret (a volume populated by a Secret) SecretName: syslog Optional: true entrypoint: Type: ConfigMap (a volume populated by a ConfigMap) Name: collector Optional: false certs: Type: Secret (a volume populated by a Secret) SecretName: collector Optional: true localtime: Type: HostPath (bare host directory volume) Path: /etc/localtime HostPathType: filebufferstorage: Type: HostPath (bare host directory volume) Path: /var/lib/fluentd HostPathType: collector-metrics: Type: Secret (a volume populated by a Secret) SecretName: collector-metrics Optional: false tmp: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: <unset> collector: Type: Secret (a volume populated by a Secret) SecretName: collector Optional: false collector-trusted-ca-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: collector-trusted-ca-bundle Optional: false kube-api-access-mwnbz: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulled 147m (x31 over 4h37m) kubelet Container image "registry.redhat.io/openshift-logging/fluentd-rhel8@sha256:b44b9b45e36e350a0b208745d3ce77e9619c1f1501d40eaab4590ddd0e43fdb2" already present on machine Warning BackOff 2m47s (x1244 over 4h37m) kubelet Back-off restarting failed container
Failing container logs:
# oc logs collector-ncrsf -c collector Setting each total_size_limit for 3 buffers to 6421478400 bytes Setting queued_chunks_limit_size for each buffer to 765 Setting chunk_limit_size for each buffer to 8388608 2022-01-24 10:58:16 +0000 [warn]: '@' is the system reserved prefix. It works in the nested configuration for now but it will be rejected: @timestamp 2022-01-24 10:58:18 +0000 [error]: [systemd-input] failed to read data from plugin storage file path="/var/lib/fluentd/pos/journal_pos.json" error_class=Yajl::ParseError error="lexical error: invalid char in json text.\n ClusterRoleBinding \"openshift-s\n (right here) ------^\n" 2022-01-24 10:58:18 +0000 [error]: config error file="/etc/fluent/fluent.conf" error_class=Fluent::ConfigError error="Unexpected error: failed to read data from plugin storage file: '/var/lib/fluentd/pos/journal_pos.json'"
Must-gather logs:
must-gather
- links to
- mentioned on
(6 mentioned on)