Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: Logging 5.9.2
Affects Version/s: Logging 5.7.z, Logging 5.8.z
Component/s: Log Collection
Labels:
- devel_ack+

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs QE Status:
NEW
QE Status:
VERIFIED
Release Note Text:

Hide
Before this update, not able to apply any change to the CLO configuration because daemonset it's not able to regenerate if configuration is not valid. With this changes CLO will try to recreate daemonset in all cases expect "Not authorized to collect"

Show
Before this update, not able to apply any change to the CLO configuration because daemonset it's not able to regenerate if configuration is not valid. With this changes CLO will try to recreate daemonset in all cases expect "Not authorized to collect"
Release Note Type:
Bug Fix
Intelligence Requested:
Market:

Severity:
Moderate

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

If the clusterlogforwarder has a wrong configuration where not able to apply any change the Logging operator until the configuration is fixed, then, it shouldn't during an upgrade to delete the daemonset collector because it's not able to regenerate it with a valid configuration. This was happening between the upgrade between Logging 5.7 to Logging 5.8 as it will be detailed below leaving to 0 the number of collectors because the daemonset didn't exist.

Version-Release number of selected component (if applicable):

$ oc get csv 
NAME                            DISPLAY                            VERSION   REPLACES                        PHASE
cluster-logging.v5.7.9          Red Hat OpenShift Logging          5.7.9     cluster-logging.v5.7.8          Succeeded
elasticsearch-operator.v5.7.9   OpenShift Elasticsearch Operator   5.7.9     elasticsearch-operator.v5.7.8   Succeeded

How reproducible:

Always

Steps to Reproduce:

$ oc get csv 
NAME                            DISPLAY                            VERSION   REPLACES                        PHASE
cluster-logging.v5.7.9          Red Hat OpenShift Logging          5.7.9     cluster-logging.v5.7.8          Succeeded
elasticsearch-operator.v5.7.9   OpenShift Elasticsearch Operator   5.7.9     elasticsearch-operator.v5.7.8   Succeeded

Have a clusterLogForwarder instance with an error as the described in https://issues.redhat.com/browse/LOG-4441

$ oc get clusterlogforwarder instance -o yaml 
apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  creationTimestamp: "2023-12-20T18:20:47Z"
  generation: 1
  name: instance
  namespace: openshift-logging
  resourceVersion: "2085335"
  uid: 84a03218-0a77-495d-9a23-3188eeff190d
spec:
  pipelines:
  - inputRefs:
    - application
    - infrastructure
    - audit
    name: container-logs
    outputRefs:
    - default
    parse: json
status:
  conditions:
  - lastTransitionTime: "2023-12-20T18:20:56Z"
    message: structuredTypeKey or structuredTypeName must be defined for Elasticsearch
      output named "default" when JSON parsing is enabled on pipeline "container-logs"
      that references it
    reason: Invalid
    status: "False"
    type: Ready

In this situation, the clusterlogging Operator doesn't apply any change until the error in the `clusterlogforwarder` CR is resolved as it's expected. Also, it's visible in the Logging Operator the message error:

$ clo=$(oc get pod -l name=cluster-logging-operator -n openshift-logging -o name)
$ oc logs $clo -n openshift-logging  |grep -i "structuredTypeKey or structuredTypeName must be defined for Elasticsearch output"  |tail -1 
{"_ts":"2023-12-20T18:21:17.157450999Z","_level":"0","_component":"cluster-logging-operator","_message":"clusterlogforwarder-controller returning, error","_error":{"msg":"structuredTypeKey or structuredTypeName must be defined for Elasticsearch output named \"default\" when JSON parsing is enabled on pipeline \"container-logs\" that references it"}}

Upgrade to Logging 5.8

$ oc get subs  -n openshift-logging
NAME              PACKAGE           SOURCE             CHANNEL
cluster-logging   cluster-logging   redhat-operators   stable-5.8$ oc get csv  -n openshift-logging
NAME                            DISPLAY                            VERSION   REPLACES                        PHASE
cluster-logging.v5.8.1          Red Hat OpenShift Logging          5.8.1     cluster-logging.v5.7.9          Succeeded
elasticsearch-operator.v5.8.1   OpenShift Elasticsearch Operator   5.8.1     elasticsearch-operator.v5.7.9   Succeeded

Actual results:

Doesn't exist collector pods:

$ oc get pods -l component=collector  -n openshift-logging
No resources found in openshift-logging namespace.

Because in the upgrade, the operator has deleted the daemonset `collector` :

$ oc get daemonset -n openshift-logging
No resources found in openshift-logging namespace.

If it's fixed the pipeline, then, the operator is able to create again the daemonset collector. Let's do it, for example, deleting the entry `parse: json` being the `clusterLogForwarder` definition as:

$ oc get clusterlogforwarder instance -o yaml -n openshift-logging
spec:
  pipelines:
  - inputRefs:
    - application
    - infrastructure
    - audit
    name: container-logs
    outputRefs:
    - default

After doing this, the daemonset collector is created by the operator and the collector pods run again:

$ oc get pods -l component=collector -n openshift-logging
NAME              READY   STATUS    RESTARTS   AGE
collector-4pn2f   1/1     Running   0          53s
collector-6spvw   1/1     Running   0          53s
collector-kf47t   1/1     Running   0          53s
collector-mjxgf   1/1     Running   0          53s
collector-qlqtk   1/1     Running   0          53s
collector-xfprd   1/1     Running   0          53s

Expected results:

If the Logging operator is in an status error, then, as the current logic doesn't allow it to recreate the resources because of a wrong configuration, it shouldn't delete any resource since it won't be able to regenerate it.

is cloned by

LOG-5514 Logging operator logic delete the daemonset collector not being able to recreate