-
Bug
-
Resolution: Done-Errata
-
Major
-
Logging 5.7.z, Logging 5.8.z
-
False
-
None
-
False
-
NEW
-
NEW
-
-
Bug Fix
-
-
-
Moderate
Description of problem:
If the clusterlogforwarder has a wrong configuration where not able to apply any change the Logging operator until the configuration is fixed, then, it shouldn't during an upgrade to delete the daemonset collector because it's not able to regenerate it with a valid configuration. This was happening between the upgrade between Logging 5.7 to Logging 5.8 as it will be detailed below leaving to 0 the number of collectors because the daemonset didn't exist.
Version-Release number of selected component (if applicable):
$ oc get csv NAME DISPLAY VERSION REPLACES PHASE cluster-logging.v5.7.9 Red Hat OpenShift Logging 5.7.9 cluster-logging.v5.7.8 Succeeded elasticsearch-operator.v5.7.9 OpenShift Elasticsearch Operator 5.7.9 elasticsearch-operator.v5.7.8 Succeeded
How reproducible:
Always
Steps to Reproduce:
$ oc get csv NAME DISPLAY VERSION REPLACES PHASE cluster-logging.v5.7.9 Red Hat OpenShift Logging 5.7.9 cluster-logging.v5.7.8 Succeeded elasticsearch-operator.v5.7.9 OpenShift Elasticsearch Operator 5.7.9 elasticsearch-operator.v5.7.8 Succeeded
Have a clusterLogForwarder instance with an error as the described in https://issues.redhat.com/browse/LOG-4441
$ oc get clusterlogforwarder instance -o yaml apiVersion: logging.openshift.io/v1 kind: ClusterLogForwarder metadata: creationTimestamp: "2023-12-20T18:20:47Z" generation: 1 name: instance namespace: openshift-logging resourceVersion: "2085335" uid: 84a03218-0a77-495d-9a23-3188eeff190d spec: pipelines: - inputRefs: - application - infrastructure - audit name: container-logs outputRefs: - default parse: json status: conditions: - lastTransitionTime: "2023-12-20T18:20:56Z" message: structuredTypeKey or structuredTypeName must be defined for Elasticsearch output named "default" when JSON parsing is enabled on pipeline "container-logs" that references it reason: Invalid status: "False" type: Ready
In this situation, the clusterlogging Operator doesn't apply any change until the error in the `clusterlogforwarder` CR is resolved as it's expected. Also, it's visible in the Logging Operator the message error:
$ clo=$(oc get pod -l name=cluster-logging-operator -n openshift-logging -o name) $ oc logs $clo -n openshift-logging |grep -i "structuredTypeKey or structuredTypeName must be defined for Elasticsearch output" |tail -1 {"_ts":"2023-12-20T18:21:17.157450999Z","_level":"0","_component":"cluster-logging-operator","_message":"clusterlogforwarder-controller returning, error","_error":{"msg":"structuredTypeKey or structuredTypeName must be defined for Elasticsearch output named \"default\" when JSON parsing is enabled on pipeline \"container-logs\" that references it"}}
Upgrade to Logging 5.8
$ oc get subs -n openshift-logging NAME PACKAGE SOURCE CHANNEL cluster-logging cluster-logging redhat-operators stable-5.8$ oc get csv -n openshift-logging NAME DISPLAY VERSION REPLACES PHASE cluster-logging.v5.8.1 Red Hat OpenShift Logging 5.8.1 cluster-logging.v5.7.9 Succeeded elasticsearch-operator.v5.8.1 OpenShift Elasticsearch Operator 5.8.1 elasticsearch-operator.v5.7.9 Succeeded
Actual results:
Doesn't exist collector pods:
$ oc get pods -l component=collector -n openshift-logging No resources found in openshift-logging namespace.
Because in the upgrade, the operator has deleted the daemonset `collector` :
$ oc get daemonset -n openshift-logging No resources found in openshift-logging namespace.
If it's fixed the pipeline, then, the operator is able to create again the daemonset collector. Let's do it, for example, deleting the entry `parse: json` being the `clusterLogForwarder` definition as:
$ oc get clusterlogforwarder instance -o yaml -n openshift-logging
spec:
pipelines:
- inputRefs:
- application
- infrastructure
- audit
name: container-logs
outputRefs:
- default
After doing this, the daemonset collector is created by the operator and the collector pods run again:
$ oc get pods -l component=collector -n openshift-logging NAME READY STATUS RESTARTS AGE collector-4pn2f 1/1 Running 0 53s collector-6spvw 1/1 Running 0 53s collector-kf47t 1/1 Running 0 53s collector-mjxgf 1/1 Running 0 53s collector-qlqtk 1/1 Running 0 53s collector-xfprd 1/1 Running 0 53s
Expected results:
If the Logging operator is in an status error, then, as the current logic doesn't allow it to recreate the resources because of a wrong configuration, it shouldn't delete any resource since it won't be able to regenerate it.
- clones
-
LOG-4910 Logging operator logic delete the daemonset collector not being able to recreate
- Closed
- links to
-
RHSA-2024:131445 security update Logging for Red Hat OpenShift - 5.8.7
- mentioned on