Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-2159

Cluster Logging Pods in CrashLoopBackOff

XMLWordPrintable

    • False
    • False
    • NEW
    • VERIFIED
    • Logging (Core) - Sprint 214, Logging (Core) - Sprint 215, Logging (Core) - Sprint 216, Logging (Core) - Sprint 218, Logging (Core) - Sprint 219

      After using a cluster for few releases cluster logging pods is going into a CrashLoopBackOff.
      In this cluster one the collector pod container show:

      2022-01-24 08:12:50 +0000 [error]: [systemd-input] failed to read data from plugin storage file path="/var/lib/fluentd/pos/journal_pos.json" error_class=Yajl::ParseError error="lexical error: invalid char in json text.\n                                       ClusterRoleBinding \"openshift-s\n                     (right here) ------^\n"
      2022-01-24 08:12:50 +0000 [error]: config error file="/etc/fluent/fluent.conf" error_class=Fluent::ConfigError error="Unexpected error: failed to read data from plugin storage file: '/var/lib/fluentd/pos/journal_pos.json'" 

      OCP Version:
      4.10.0-0.nightly-ppc64le-2022-01-05-135338

      CSV and Pods Details:

      # oc get csv
      NAME                             DISPLAY                            VERSION   REPLACES   PHASE
      cluster-logging.5.3.3-2          Red Hat OpenShift Logging          5.3.3-2              Succeeded
      elasticsearch-operator.5.3.3-2   OpenShift Elasticsearch Operator   5.3.3-2              Succeeded
      # oc get pods
      NAME                                            READY   STATUS             RESTARTS       AGE
      cluster-logging-operator-5f7644c69d-9m6d6       1/1     Running            0              4d
      collector-jdgxn                                 2/2     Running            0              4d
      collector-ncrsf                                 1/2     CrashLoopBackOff   27 (78s ago)   116m
      collector-qm28g                                 2/2     Running            0              4d
      collector-tsf4m                                 2/2     Running            0              4d
      collector-xv2cf                                 2/2     Running            0              4d
      elasticsearch-cdm-jl6h446h-1-64cbfdc7fb-bwv96   2/2     Running            0              4d
      elasticsearch-cdm-jl6h446h-2-5fc86d4f95-ph62l   2/2     Running            0              4d
      elasticsearch-cdm-jl6h446h-3-cd7f75df4-xlzgc    2/2     Running            0              4d
      elasticsearch-im-app-27383535--1-86j2q          0/1     Completed          0              4m25s
      elasticsearch-im-audit-27383535--1-pt8c2        0/1     Completed          0              4m25s
      elasticsearch-im-infra-27383535--1-n29rr        0/1     Completed          0              4m25s
      kibana-7df85cf878-t4vnb                         2/2     Running            0              4d  

      Failing pods description:

      # oc describe pods collector-ncrsf
      Name:                 collector-ncrsf
      Namespace:            openshift-logging
      Priority:             1000000
      Priority Class Name:  cluster-logging
      Node:                 master-1/9.47.88.202
      Start Time:           Mon, 24 Jan 2022 01:23:24 -0500
      Labels:               component=collector
                            controller-revision-hash=85cbff8f76
                            logging-infra=collector
                            pod-template-generation=2
                            provider=openshift
      Annotations:          k8s.v1.cni.cncf.io/network-status:
                              [{
                                  "name": "openshift-sdn",
                                  "interface": "eth0",
                                  "ips": [
                                      "10.130.0.58"
                                  ],
                                  "default": true,
                                  "dns": {}
                              }]
                            k8s.v1.cni.cncf.io/networks-status:
                              [{
                                  "name": "openshift-sdn",
                                  "interface": "eth0",
                                  "ips": [
                                      "10.130.0.58"
                                  ],
                                  "default": true,
                                  "dns": {}
                              }]
                            logging.openshift.io/hash: 3f97714d556aad5b80fee5e2eaa3e16b
                            openshift.io/scc: log-collector-scc
                            scheduler.alpha.kubernetes.io/critical-pod:
      Status:               Running
      IP:                   10.130.0.58
      IPs:
        IP:           10.130.0.58
      Controlled By:  DaemonSet/collector
      Containers:
        collector:
          Container ID:   cri-o://0425ab3f0275b815246ac6f509a8588c9c50d3603b51f24b313c590641df1e27
          Image:          registry.redhat.io/openshift-logging/fluentd-rhel8@sha256:b44b9b45e36e350a0b208745d3ce77e9619c1f1501d40eaab4590ddd0e43fdb2
          Image ID:       registry.redhat.io/openshift-logging/fluentd-rhel8@sha256:6f9db5919505d19f846f497572480ded5301c3af8dcffdadeacf1d3f724bc20a
          Port:           24231/TCP
          Host Port:      0/TCP
          State:          Waiting
            Reason:       CrashLoopBackOff
          Last State:     Terminated
            Reason:       Error
            Exit Code:    2
            Started:      Mon, 24 Jan 2022 05:58:14 -0500
            Finished:     Mon, 24 Jan 2022 05:58:18 -0500
          Ready:          False
          Restart Count:  58
          Limits:
            memory:  736Mi
          Requests:
            cpu:     100m
            memory:  736Mi
          Environment:
            NODE_NAME:             (v1:spec.nodeName)
            METRICS_CERT:         /etc/fluent/metrics/tls.crt
            METRICS_KEY:          /etc/fluent/metrics/tls.key
            NODE_IPV4:             (v1:status.hostIP)
            POD_IP:                (v1:status.podIP)
            HTTPS_PROXY:          http://mju-logg-410-bastion-0:3128
            HTTP_PROXY:           http://mju-logg-410-bastion-0:3128
            NO_PROXY:             .cluster.local,.mju-logg-410.ibm.com,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,172.30.0.0/16,9.47.80.0/20,api-int.mju-logg-410.ibm.com,localhost
            COLLECTOR_CONF_HASH:  4f630b5526a99622bf44b6cae6c8a0a7
          Mounts:
            /etc/fluent/configs.d/secure-forward from secureforwardconfig (ro)
            /etc/fluent/configs.d/syslog from syslogconfig (ro)
            /etc/fluent/configs.d/user from config (ro)
            /etc/fluent/keys from certs (ro)
            /etc/fluent/metrics from collector-metrics (ro)
            /etc/localtime from localtime (ro)
            /etc/ocp-forward from secureforwardcerts (ro)
            /etc/ocp-syslog from syslogcerts (ro)
            /etc/pki/ca-trust/extracted/pem/ from collector-trusted-ca-bundle (ro)
            /opt/app-root/src/run.sh from entrypoint (ro,path="run.sh")
            /tmp from tmp (rw)
            /var/lib/fluentd from filebufferstorage (rw)
            /var/log/audit from varlogaudit (ro)
            /var/log/containers from varlogcontainers (ro)
            /var/log/journal from varlogjournal (ro)
            /var/log/kube-apiserver from varlogkubeapiserver (ro)
            /var/log/oauth-apiserver from varlogoauthapiserver (ro)
            /var/log/openshift-apiserver from varlogopenshiftapiserver (ro)
            /var/log/ovn from varlogovn (ro)
            /var/log/pods from varlogpods (ro)
            /var/run/ocp-collector/secrets/collector from collector (ro)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mwnbz (ro)
        logfilesmetricexporter:
          Container ID:  cri-o://63c336d8f4f3d85b018ec17ba80874b86159d7063c60d911b39a00f7d1504211
          Image:         registry.redhat.io/openshift-logging/log-file-metric-exporter-rhel8@sha256:ee5badc62b5a1066cd2520cb219c74a18da47bd440bef4ebac4939433c7345ff
          Image ID:      registry.redhat.io/openshift-logging/log-file-metric-exporter-rhel8@sha256:98e13663bdd8044b5cb493779e068b21c5cb26f519f12bdd3627cc11141df0a1
          Port:          2112/TCP
          Host Port:     0/TCP
          Command:
            /usr/local/bin/log-file-metric-exporter
              -verbosity=2
             -dir=/var/log/containers
             -http=:2112
             -keyFile=/etc/fluent/metrics/tls.key
             -crtFile=/etc/fluent/metrics/tls.crt
          State:          Running
            Started:      Mon, 24 Jan 2022 01:23:27 -0500
          Ready:          True
          Restart Count:  0
          Environment:    <none>
          Mounts:
            /etc/fluent/metrics from collector-metrics (rw)
            /var/log from varlog (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mwnbz (ro)
      Conditions:
        Type              Status
        Initialized       True
        Ready             False
        ContainersReady   False
        PodScheduled      True
      Volumes:
        varlog:
          Type:          HostPath (bare host directory volume)
          Path:          /var/log
          HostPathType:
        varlogcontainers:
          Type:          HostPath (bare host directory volume)
          Path:          /var/log/containers
          HostPathType:
        varlogpods:
          Type:          HostPath (bare host directory volume)
          Path:          /var/log/pods
          HostPathType:
        varlogjournal:
          Type:          HostPath (bare host directory volume)
          Path:          /var/log/journal
          HostPathType:
        varlogaudit:
          Type:          HostPath (bare host directory volume)
          Path:          /var/log/audit
          HostPathType:
        varlogovn:
          Type:          HostPath (bare host directory volume)
          Path:          /var/log/ovn
          HostPathType:
        varlogoauthapiserver:
          Type:          HostPath (bare host directory volume)
          Path:          /var/log/oauth-apiserver
          HostPathType:
        varlogopenshiftapiserver:
          Type:          HostPath (bare host directory volume)
          Path:          /var/log/openshift-apiserver
          HostPathType:
        varlogkubeapiserver:
          Type:          HostPath (bare host directory volume)
          Path:          /var/log/kube-apiserver
          HostPathType:
        config:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      collector
          Optional:  false
        secureforwardconfig:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      secure-forward
          Optional:  true
        secureforwardcerts:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  secure-forward
          Optional:    true
        syslogconfig:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      syslog
          Optional:  true
        syslogcerts:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  syslog
          Optional:    true
        entrypoint:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      collector
          Optional:  false
        certs:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  collector
          Optional:    true
        localtime:
          Type:          HostPath (bare host directory volume)
          Path:          /etc/localtime
          HostPathType:
        filebufferstorage:
          Type:          HostPath (bare host directory volume)
          Path:          /var/lib/fluentd
          HostPathType:
        collector-metrics:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  collector-metrics
          Optional:    false
        tmp:
          Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
          Medium:     Memory
          SizeLimit:  <unset>
        collector:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  collector
          Optional:    false
        collector-trusted-ca-bundle:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      collector-trusted-ca-bundle
          Optional:  false
        kube-api-access-mwnbz:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3607
          ConfigMapName:           kube-root-ca.crt
          ConfigMapOptional:       <nil>
          DownwardAPI:             true
          ConfigMapName:           openshift-service-ca.crt
          ConfigMapOptional:       <nil>
      QoS Class:                   Burstable
      Node-Selectors:              kubernetes.io/os=linux
      Tolerations:                 node-role.kubernetes.io/master:NoSchedule op=Exists
                                   node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                                   node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                                   node.kubernetes.io/not-ready:NoExecute op=Exists
                                   node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                                   node.kubernetes.io/unreachable:NoExecute op=Exists
                                   node.kubernetes.io/unschedulable:NoSchedule op=Exists
      Events:
        Type     Reason   Age                       From     Message
        ----     ------   ----                      ----     -------
        Normal   Pulled   147m (x31 over 4h37m)     kubelet  Container image "registry.redhat.io/openshift-logging/fluentd-rhel8@sha256:b44b9b45e36e350a0b208745d3ce77e9619c1f1501d40eaab4590ddd0e43fdb2" already present on machine
        Warning  BackOff  2m47s (x1244 over 4h37m)  kubelet  Back-off restarting failed container
       

      Failing container logs:

      # oc logs collector-ncrsf -c collector
      Setting each total_size_limit for 3 buffers to 6421478400 bytes
      Setting queued_chunks_limit_size for each buffer to 765
      Setting chunk_limit_size for each buffer to 8388608
      2022-01-24 10:58:16 +0000 [warn]: '@' is the system reserved prefix. It works in the nested configuration for now but it will be rejected: @timestamp
      2022-01-24 10:58:18 +0000 [error]: [systemd-input] failed to read data from plugin storage file path="/var/lib/fluentd/pos/journal_pos.json" error_class=Yajl::ParseError error="lexical error: invalid char in json text.\n                                       ClusterRoleBinding \"openshift-s\n                     (right here) ------^\n"
      2022-01-24 10:58:18 +0000 [error]: config error file="/etc/fluent/fluent.conf" error_class=Fluent::ConfigError error="Unexpected error: failed to read data from plugin storage file: '/var/lib/fluentd/pos/journal_pos.json'"
       

      Must-gather logs:
      must-gather

            pmoogi pratibha moogi (Inactive)
            satwsing Satwinder Singh (Inactive)
            Ishwar Kanse Ishwar Kanse
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: