Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-4070

Collector pods in CrashLoopBackOff state after CLO upgraded to v5.7

XMLWordPrintable

    • False
    • None
    • False
    • NEW
    • NEW
    • Log Collection - Sprint 236, Log Collection - Sprint 237

      Description of problem:

       

      On DevSandbox clusters, the Cluster Logging Operator was upgraded to v5.7.0 and our collector pod fell in CrashLoopBackOff state with the following error:

      error[E701]: call to undefined variable┌─ :1:90│1 │ (.kubernetes.namespace_name == "toolchain-host-operator") && (.kubernetes.labels.control-plane == "controller-manager")│ ^^^^^│ ││ undefined variable│ did you mean "false"?│= see language documentation at https://vrl.deverror[E100]: unhandled error┌─ :1:1│1 │ (.kubernetes.namespace_name == "toolchain-host-operator") && (.kubernetes.labels.control-plane == "controller-manager")│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^│ ││ expression can result in runtime error│ handle the error case to ensure runtime success│ 

      Version-Release number of selected component (if applicable):

      v5.7.0

      How reproducible:

       

      Steps to Reproduce:

      1.  
      2.  
      3. ...

      Actual results:

      Expected results:

      Additional info:

      Current Cluster Logging resource:

      apiVersion: logging.openshift.io/v1
      kind: ClusterLogging
      metadata:   creationTimestamp: '2021-10-28T12:58:39Z'
        generation: 1
        managedFields:     - apiVersion: logging.openshift.io/v1
            fieldsType: FieldsV1
            fieldsV1:
              'f:spec':
                .: {}
                'f:collection':
                  .: {}
                  'f:logs':
                    .: {}
                    'f:fluentd':
                      .: {}
                      'f:resources':
                        .: {}
                        'f:limits':
                          .: {}
                          'f:memory': {}
                        'f:requests':
                          .: {}
                          'f:cpu': {}
                          'f:memory': {}
                    'f:type': {}
                'f:managementState': {}
            manager: sandbox-cli
            operation: Update
            time: '2021-10-28T12:58:39Z'
          - apiVersion: logging.openshift.io/v1
            fieldsType: FieldsV1
            fieldsV1:
              'f:status':
                .: {}
                'f:clusterConditions': {}
                'f:collection':
                  .: {}
                  'f:logs':
                    .: {}
                    'f:fluentdStatus':
                      .: {}
                      'f:daemonSet': {}
                      'f:nodes':
                        'f:collector-l6p9l': {}
                        'f:collector-48jvj': {}
                        'f:collector-572k9': {}
                        'f:collector-2w2b6': {}
                        .: {}
                        'f:collector-m9jsz': {}
                        'f:collector-d86l2': {}
                        'f:collector-9gv8x': {}
                        'f:collector-pdwcr': {}
                      'f:pods':
                        .: {}
                        'f:failed': {}
                        'f:notReady': {}
                        'f:ready': {}
                'f:conditions': {}
                'f:curation': {}
                'f:logStore': {}
                'f:visualization': {}
            manager: cluster-logging-operator
            operation: Update
            subresource: status
            time: '2023-04-26T07:58:23Z'
        name: instance
        namespace: openshift-logging
        resourceVersion: '2517536636'
        uid: cddd1ccc-9374-4868-b3cc-956d21f49900
      spec:   collection:     logs:       fluentd:         resources:           limits:             memory: 736Mi
                requests:             cpu: 100m
                  memory: 736Mi
            type: fluentd
          type: vector
        managementState: Managed
      status:   collection:     logs:       fluentdStatus:         daemonSet: collector
              nodes:           collector-2w2b6: ip-10-0-248-35.ec2.internal
                collector-48jvj: ip-10-0-188-155.ec2.internal
                collector-572k9: ip-10-0-199-46.ec2.internal
                collector-9gv8x: ip-10-0-204-219.ec2.internal
                collector-d86l2: ip-10-0-231-58.ec2.internal
                collector-l6p9l: ip-10-0-248-164.ec2.internal
                collector-m9jsz: ip-10-0-200-101.ec2.internal
                collector-pdwcr: ip-10-0-255-6.ec2.internal
              pods:           failed: []
                notReady: []
                ready:             - collector-2w2b6
                  - collector-48jvj
                  - collector-572k9
                  - collector-9gv8x
                  - collector-d86l2
                  - collector-l6p9l
                  - collector-m9jsz
                  - collector-pdwcr
        conditions:     - lastTransitionTime: '2022-08-18T16:21:42Z'
            status: 'False'
            type: CollectorDeadEnd
          - lastTransitionTime: '2022-08-18T16:21:47Z'
            message: curator is deprecated in favor of defining retention policy
            reason: ResourceDeprecated
            status: 'True'
            type: CuratorRemoved
        curation: {}
        logStore: {}
        visualization: {}
      
       

      ClusterLogForwarder:

      apiVersion: logging.openshift.io/v1
      kind: ClusterLogForwarder
      metadata:
        name: instance
        namespace: openshift-logging
      spec:
        inputs:
          - application:
              namespaces:
                - toolchain-member-operator
              selector:
                matchLabels:
                  control-plane: controller-manager
            name: toolchain-member-operator
          - application:
              namespaces:
                - toolchain-member-operator
              selector:
                matchLabels:
                  app: member-operator-webhook
            name: toolchain-member-operator-webhook
          - application:
              namespaces:
                - codeready-workspaces-operator
              selector:
                matchLabels:
                  app: codeready-operator
            name: codeready-workspaces-operator
          - application:
              namespaces:
                - codeready-workspaces-operator
              selector:
                matchLabels:
                  app: codeready
                  component: codeready
            name: codeready
        outputs:
          - name: loki
            type: loki
            url: 'http://loki.openshift-customer-monitoring.svc.cluster.local:3100'
        pipelines:
          - inputRefs:
              - toolchain-host-operator
            labels:
              namespace: toolchain-host-operator
            name: toolchain-host-operator-to-loki
            outputRefs:
              - loki
            parse: json
          - inputRefs:
              - registration-service
            labels:
              namespace: toolchain-host-operator
            name: registration-service-to-loki
            outputRefs:
              - loki
            parse: json
          - inputRefs:
              - toolchain-member-operator
            labels:
              namespace: toolchain-member-operator
            name: toolchain-member-operator-to-loki
            outputRefs:
              - loki
            parse: json
          - inputRefs:
              - toolchain-member-operator-webhook
            labels:
              namespace: toolchain-member-operator
            name: toolchain-member-operator-webhook-to-loki
            outputRefs:
              - loki
            parse: json
          - inputRefs:
              - codeready-workspaces-operator
            labels:
              namespace: codeready-workspaces-operator
            name: codeready-workspaces-operator-to-loki
            outputRefs:
              - loki
            parse: json
          - inputRefs:
              - codeready
            labels:
              namespace: codeready-workspaces-operator
            name: codeready-to-loki
            outputRefs:
              - loki
            parse: json
      status:
        conditions:
          - lastTransitionTime: '2023-05-10T17:21:58Z'
            message: 'No valid inputs, outputs, or pipelines. Invalid CLF spec.'
            reason: Invalid
            status: 'False'
            type: Ready
        inputs:
      ...
      
      
      
      
      
      
      

      also, the list of collector pods is not up-to-date with the cluster state

              Unassigned Unassigned
              xcoulon@redhat.com Xavier Coulon
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: