Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: Logging 5.8.7
Affects Version/s: Logging 5.7.z, Logging 5.8.z
Component/s: Log Collection
Labels:
- devel_ack+

Blocked:
False
Blocked Reason:
None
Ready:
False
Docs QE Status:
NEW
QE Status:
NEW
Release Note Text:

Hide
Before this update, not able to apply any change to the CLO configuration because daemonset it's not able to regenerate if configuration is not valid. With this changes CLO will try to recreate daemonset in all cases expect "Not authorized to collect"

Show
Before this update, not able to apply any change to the CLO configuration because daemonset it's not able to regenerate if configuration is not valid. With this changes CLO will try to recreate daemonset in all cases expect "Not authorized to collect"
Release Note Type:
Bug Fix
Intelligence Requested:
Market:

Severity:
Moderate

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

If the clusterlogforwarder has a wrong configuration where not able to apply any change the Logging operator until the configuration is fixed, then, it shouldn't during an upgrade to delete the daemonset collector because it's not able to regenerate it with a valid configuration. This was happening between the upgrade between Logging 5.7 to Logging 5.8 as it will be detailed below leaving to 0 the number of collectors because the daemonset didn't exist.

Version-Release number of selected component (if applicable):

$ oc get csv 
NAME                            DISPLAY                            VERSION   REPLACES                        PHASE
cluster-logging.v5.7.9          Red Hat OpenShift Logging          5.7.9     cluster-logging.v5.7.8          Succeeded
elasticsearch-operator.v5.7.9   OpenShift Elasticsearch Operator   5.7.9     elasticsearch-operator.v5.7.8   Succeeded

How reproducible:

Always

Steps to Reproduce:

$ oc get csv 
NAME                            DISPLAY                            VERSION   REPLACES                        PHASE
cluster-logging.v5.7.9          Red Hat OpenShift Logging          5.7.9     cluster-logging.v5.7.8          Succeeded
elasticsearch-operator.v5.7.9   OpenShift Elasticsearch Operator   5.7.9     elasticsearch-operator.v5.7.8   Succeeded

Have a clusterLogForwarder instance with an error as the described in https://issues.redhat.com/browse/LOG-4441

$ oc get clusterlogforwarder instance -o yaml 
apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  creationTimestamp: "2023-12-20T18:20:47Z"
  generation: 1
  name: instance
  namespace: openshift-logging
  resourceVersion: "2085335"
  uid: 84a03218-0a77-495d-9a23-3188eeff190d
spec:
  pipelines:
  - inputRefs:
    - application
    - infrastructure
    - audit
    name: container-logs
    outputRefs:
    - default
    parse: json
status:
  conditions:
  - lastTransitionTime: "2023-12-20T18:20:56Z"
    message: structuredTypeKey or structuredTypeName must be defined for Elasticsearch
      output named "default" when JSON parsing is enabled on pipeline "container-logs"
      that references it
    reason: Invalid
    status: "False"
    type: Ready

In this situation, the clusterlogging Operator doesn't apply any change until the error in the `clusterlogforwarder` CR is resolved as it's expected. Also, it's visible in the Logging Operator the message error:

$ clo=$(oc get pod -l name=cluster-logging-operator -n openshift-logging -o name)
$ oc logs $clo -n openshift-logging  |grep -i "structuredTypeKey or structuredTypeName must be defined for Elasticsearch output"  |tail -1 
{"_ts":"2023-12-20T18:21:17.157450999Z","_level":"0","_component":"cluster-logging-operator","_message":"clusterlogforwarder-controller returning, error","_error":{"msg":"structuredTypeKey or structuredTypeName must be defined for Elasticsearch output named \"default\" when JSON parsing is enabled on pipeline \"container-logs\" that references it"}}

Upgrade to Logging 5.8

$ oc get subs  -n openshift-logging
NAME              PACKAGE           SOURCE             CHANNEL
cluster-logging   cluster-logging   redhat-operators   stable-5.8$ oc get csv  -n openshift-logging
NAME                            DISPLAY                            VERSION   REPLACES                        PHASE
cluster-logging.v5.8.1          Red Hat OpenShift Logging          5.8.1     cluster-logging.v5.7.9          Succeeded
elasticsearch-operator.v5.8.1   OpenShift Elasticsearch Operator   5.8.1     elasticsearch-operator.v5.7.9   Succeeded

Actual results:

Doesn't exist collector pods:

$ oc get pods -l component=collector  -n openshift-logging
No resources found in openshift-logging namespace.

Because in the upgrade, the operator has deleted the daemonset `collector` :

$ oc get daemonset -n openshift-logging
No resources found in openshift-logging namespace.

If it's fixed the pipeline, then, the operator is able to create again the daemonset collector. Let's do it, for example, deleting the entry `parse: json` being the `clusterLogForwarder` definition as:

$ oc get clusterlogforwarder instance -o yaml -n openshift-logging
spec:
  pipelines:
  - inputRefs:
    - application
    - infrastructure
    - audit
    name: container-logs
    outputRefs:
    - default

After doing this, the daemonset collector is created by the operator and the collector pods run again:

$ oc get pods -l component=collector -n openshift-logging
NAME              READY   STATUS    RESTARTS   AGE
collector-4pn2f   1/1     Running   0          53s
collector-6spvw   1/1     Running   0          53s
collector-kf47t   1/1     Running   0          53s
collector-mjxgf   1/1     Running   0          53s
collector-qlqtk   1/1     Running   0          53s
collector-xfprd   1/1     Running   0          53s

Expected results:

If the Logging operator is in an status error, then, as the current logic doesn't allow it to recreate the resources because of a wrong configuration, it shouldn't delete any resource since it won't be able to regenerate it.

clones

LOG-4910 Logging operator logic delete the daemonset collector not being able to recreate

Closed

links to

[KCS] Collector pods not running in RHOCP 4

openshift/cluster-logging-operator#2476: LOG-5514: remove collector daemonset only if 'Not authorized to collect' error occurs

openshift/openshift-docs#76276: OBSDOCS-1062 - Logging 5.8.7 Release Notes

RHSA-2024:131445 security update Logging for Red Hat OpenShift - 5.8.7

mentioned on

Merge request - Updated US source to: 09c7e9e LOG-5514: remove collector daemonset only if 'Not authorized to collect' error occurs

(1 mentioned on)

Assignee:: Vitalii Parfonov

Reporter:: Oscar Casal Sanchez

QA Contact:: Qiaoling Tang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/05/06 8:29 AM

Updated:: 2024/05/29 4:38 PM

Resolved:: 2024/05/23 7:09 AM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates