Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-4910

Logging operator logic delete the daemonset collector not being able to recreate

XMLWordPrintable

    • False
    • None
    • False
    • NEW
    • VERIFIED
    • Hide
      Before this update, not able to apply any change to the CLO configuration because daemonset it's not able to regenerate if configuration is not valid. With this changes CLO will try to recreate daemonset in all cases expect "Not authorized to collect"
      Show
      Before this update, not able to apply any change to the CLO configuration because daemonset it's not able to regenerate if configuration is not valid. With this changes CLO will try to recreate daemonset in all cases expect "Not authorized to collect"
    • Bug Fix
    • Moderate

      Description of problem:

      If the clusterlogforwarder has a wrong configuration where not able to apply any change the Logging operator until the configuration is fixed, then, it shouldn't during an upgrade to delete the daemonset collector because it's not able to regenerate it with a valid configuration. This was happening between the upgrade between Logging 5.7 to Logging 5.8 as it will be detailed below leaving to 0 the number of collectors because the daemonset didn't exist.

      Version-Release number of selected component (if applicable):

      $ oc get csv 
      NAME                            DISPLAY                            VERSION   REPLACES                        PHASE
      cluster-logging.v5.7.9          Red Hat OpenShift Logging          5.7.9     cluster-logging.v5.7.8          Succeeded
      elasticsearch-operator.v5.7.9   OpenShift Elasticsearch Operator   5.7.9     elasticsearch-operator.v5.7.8   Succeeded 

      How reproducible:

      Always

      Steps to Reproduce:

      $ oc get csv 
      NAME                            DISPLAY                            VERSION   REPLACES                        PHASE
      cluster-logging.v5.7.9          Red Hat OpenShift Logging          5.7.9     cluster-logging.v5.7.8          Succeeded
      elasticsearch-operator.v5.7.9   OpenShift Elasticsearch Operator   5.7.9     elasticsearch-operator.v5.7.8   Succeeded 

       

      Have a clusterLogForwarder instance with an error as the described in https://issues.redhat.com/browse/LOG-4441

      $ oc get clusterlogforwarder instance -o yaml 
      apiVersion: logging.openshift.io/v1
      kind: ClusterLogForwarder
      metadata:
        creationTimestamp: "2023-12-20T18:20:47Z"
        generation: 1
        name: instance
        namespace: openshift-logging
        resourceVersion: "2085335"
        uid: 84a03218-0a77-495d-9a23-3188eeff190d
      spec:
        pipelines:
        - inputRefs:
          - application
          - infrastructure
          - audit
          name: container-logs
          outputRefs:
          - default
          parse: json
      status:
        conditions:
        - lastTransitionTime: "2023-12-20T18:20:56Z"
          message: structuredTypeKey or structuredTypeName must be defined for Elasticsearch
            output named "default" when JSON parsing is enabled on pipeline "container-logs"
            that references it
          reason: Invalid
          status: "False"
          type: Ready

       

      In this situation, the clusterlogging Operator doesn't apply any change until the error in the `clusterlogforwarder` CR is resolved as it's expected. Also, it's visible in the Logging Operator the message error:

      $ clo=$(oc get pod -l name=cluster-logging-operator -n openshift-logging -o name)
      $ oc logs $clo -n openshift-logging  |grep -i "structuredTypeKey or structuredTypeName must be defined for Elasticsearch output"  |tail -1 
      {"_ts":"2023-12-20T18:21:17.157450999Z","_level":"0","_component":"cluster-logging-operator","_message":"clusterlogforwarder-controller returning, error","_error":{"msg":"structuredTypeKey or structuredTypeName must be defined for Elasticsearch output named \"default\" when JSON parsing is enabled on pipeline \"container-logs\" that references it"}}
      

       

      Upgrade to Logging 5.8

      $ oc get subs  -n openshift-logging
      NAME              PACKAGE           SOURCE             CHANNEL
      cluster-logging   cluster-logging   redhat-operators   stable-5.8$ oc get csv  -n openshift-logging
      NAME                            DISPLAY                            VERSION   REPLACES                        PHASE
      cluster-logging.v5.8.1          Red Hat OpenShift Logging          5.8.1     cluster-logging.v5.7.9          Succeeded
      elasticsearch-operator.v5.8.1   OpenShift Elasticsearch Operator   5.8.1     elasticsearch-operator.v5.7.9   Succeeded 

       

      Actual results:

      Doesn't exist collector pods:

      $ oc get pods -l component=collector  -n openshift-logging
      No resources found in openshift-logging namespace.

      Because in the upgrade, the operator has deleted the daemonset `collector` :

      $ oc get daemonset -n openshift-logging
      No resources found in openshift-logging namespace. 

      If it's fixed the pipeline, then, the operator is able to create again the daemonset collector. Let's do it, for example, deleting the entry `parse: json` being the `clusterLogForwarder` definition as:

      $ oc get clusterlogforwarder instance -o yaml -n openshift-logging
      spec:
        pipelines:
        - inputRefs:
          - application
          - infrastructure
          - audit
          name: container-logs
          outputRefs:
          - default

      After doing this, the daemonset collector is created by the operator and the collector pods run again:

      $ oc get pods -l component=collector -n openshift-logging
      NAME              READY   STATUS    RESTARTS   AGE
      collector-4pn2f   1/1     Running   0          53s
      collector-6spvw   1/1     Running   0          53s
      collector-kf47t   1/1     Running   0          53s
      collector-mjxgf   1/1     Running   0          53s
      collector-qlqtk   1/1     Running   0          53s
      collector-xfprd   1/1     Running   0          53s 

       

      Expected results:

      If the Logging operator is in an status error, then, as the current logic doesn't allow it to recreate the resources because of a wrong configuration, it shouldn't delete any resource since it won't be able to regenerate it.

            vparfono Vitalii Parfonov
            rhn-support-ocasalsa Oscar Casal Sanchez
            Kabir Bharti Kabir Bharti
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: