[LOG-5007] ClusterLogForwarder's status is constantly replaced, leading to infinite reconciliation runs - Red Hat Issue Tracker

Type: Bug
Resolution: Done-Errata
Priority: Minor
Fix Version/s: Logging 5.9.0
Affects Version/s: Logging 5.9.0
Component/s: Log Collection
Labels:
- devel_ack+

Blocked:
False
Blocked Reason:
None
Ready:
False
Docs QE Status:
NEW
QE Status:
NEW
Release Note Text:

Hide
In specific corner cases, replacing the CLF status field caused the resourceVersion to be updated constantly due to changing timestamps in Status conditions. This in turn lead to an infinite reconciliation loop. Instead, synchronize all status conditions so that timestamps remain unchanged if conditions are not changed.

Show
In specific corner cases, replacing the CLF status field caused the resourceVersion to be updated constantly due to changing timestamps in Status conditions. This in turn lead to an infinite reconciliation loop. Instead, synchronize all status conditions so that timestamps remain unchanged if conditions are not changed.
Release Note Type:
Bug Fix
Intelligence Requested:
Market:

Sprint:
Log Collection - Sprint 249, Log Collection - Sprint 250
Severity:
Low

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

This affects the old / legacy implementation of the forwarder. For the upstream bug, see: https://github.com/openshift/cluster-logging-operator/issues/2315

ClusterLogForwarder's status is constantly replaced, leading to infinite reconciliation runs

The following code:

controllers/forwarding/forwarding_controller.go
@@ -83,7 +83,9 @@ func (r *ReconcileForwarder) Reconcile(ctx context.Context, request ctrl.Request
        // Fetch the ClusterLogForwarder instance
        instance, err, status := loader.FetchClusterLogForwarder(r.Client, request.NamespacedName.Namespace, request.NamespacedName.Name, true, func() logging.ClusterLogging { return *cl })
        if status != nil {
              instance.Status = *status

always repalces the entire CLF status. That in turn leads to constant status updates of the type Ready condition where lastTransitionTime is always set, even though no actual transition happens.

In turn, the operator infinitely reconciles resources, even though nothing happens.

You can easily see this by running the current version of the Operator with debug level 3:

 LOG_LEVEL=3 make run

With the following 2 resources in place:

[akaris@workstation logging]$ cat configuring-cluster-logging-minimal.yaml
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance" 
  namespace: openshift-logging
spec:
  managementState: "Managed"
  collection:
    tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io/master
      operator: Exists
    type: vector
[akaris@workstation logging]$ cat configuring-log-forwarder.yaml
apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  pipelines:
   - inputRefs:
     - audit
     outputRefs:
     - loki-external
     labels:
       job: openshift-audit
       clusterid: sno10
   - inputRefs:
     - application
     outputRefs:
     - loki-external
     labels:
       job: openshift-application
       clusterid: sno10
   - inputRefs:
     - infrastructure
     outputRefs:
     - loki-external
     labels:
       job: openshift-infrastructure
       clusterid: sno10
  outputs:
  - name: loki-external
    type: loki

links to

openshift/cluster-logging-operator#2313: Catch nil ptr exceptions with enabled Forwarder and empty Collection

openshift/openshift-docs#73530: [DOCS] OBSDOCS-833 Logging 5.9.0 Release Notes

RHBA-2024:128809 Logging Subsystem 5.9.0 - Red Hat OpenShift

Assignee:: Andreas Karis

Reporter:: Andreas Karis

QA Contact:: Anping Li

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/01/22 9:47 AM

Updated:: 2024/04/04 3:30 PM

Resolved:: 2024/04/04 12:43 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide