[LOG-3559] After logging operator upgraded to 5.6.0, collectors restart every 5 minutes. - Red Hat Issue Tracker

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: Logging 5.6.2
Affects Version/s: Logging 5.6.0
Component/s: Log Collection
Labels:
- devel_ack+
- rn-done

Blocked:
False
Blocked Reason:
None
Ready:
False
Docs QE Status:
NEW
QE Status:
VERIFIED
Release Note Text:

Hide
Before this update, when the `ClusterLogForwarder` custom resource (CR) had multiple pipelines configured, with one output set as `default`, the collector pods restarted. With this update, the logic for output validation has been corrected, resolving the issue.

Show
Before this update, when the `ClusterLogForwarder` custom resource (CR) had multiple pipelines configured, with one output set as `default`, the collector pods restarted. With this update, the logic for output validation has been corrected, resolving the issue.
Intelligence Requested:
Market:

Sprint:
Log Collection - Sprint 231, Log Collection - Sprint 232

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

Collector pods get deleted after a couple of minutes and recreated. Setting CL instance to Unmanaged, the restarts stop.

Version-Release number of selected component (if applicable):

5.6.0

How reproducible:

Occurs in multiple clusters on customer side, not successfully reproduced in lab

Steps to Reproduce:

Upgraded from 5.5 to 5.6
Set CL instance to Managed...

Actual results:

Events show:
Tue Jan 24 09:31:00 CST 2023 openshift-logging Warning Invalid clusterlogforwarder/instance invalid: unrecognized outputs: [default], no valid outputs
Mon Jan 23 04:47:57 CST 2023 openshift-logging Warning Invalid clusterlogforwarder/instance

CLO logs only repeat:

{{}}

2023-01-19T17:28:03.202102402Z {"_ts":"2023-01-19T17:28:03.20110673Z","_level":"0","_component":"cluster-logging-operator","_message":"clusterlogging-controller error updating status","_error":{"msg":"Operation cannot be fulfilled on clusterloggings.logging.openshift.io \"instance\": the object has been modified; please apply your changes to the latest version and try again"}}

CLF:

 
spec:
  outputs:
  - name: rsyslog-prod
    syslog:
      appName: example
      facility: user
      msgID: mymsg
      procID: myproc
      rfc: RFC5424
      severity: informational
    type: syslog
    url: udp://syslog.example.com:514
  pipelines:
  - inputRefs:
    - audit
    labels:
      syslog: example-prod
    name: syslog-prod
    outputRefs:
    - rsyslog-prod
    parse: json
  - inputRefs:
    - application
    - infrastructure
    name: enable-default-logs
    outputRefs:
    - default

When the CLO is set to Unmanaged, all logs flow normally to syslog and elasticsearch, so the configs themselves seem to work just fine.

Expected results:

I dont see any errors in the config; I would expect the CLO to either NOT restart pods, or else to log what it thinks the error is.

Additional info:

is related to

LOG-3437 [release-5.6] No error in the clf/instance when creating a clf with an output named as `default`.

Closed

links to

openshift/cluster-logging-operator#1849: [release-5.6] LOG-3437: fix for invalidate or migrate default output

openshift/cluster-logging-operator#1882: [master] LOG-3645-default-output-migration-fix

openshift/openshift-docs#55965: RHDEVDOCS-4928 - Logging 5.6.2 Release Notes

mentioned on

Merge request - Updated US source to: 43a936e Merge pull request #1846 from syedriko/syedriko-cert-scripts-dir-release-5.6-take-two

Assignee:: Casey Hartman

Reporter:: Steven Walter

QA Contact:: Qiaoling Tang

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2023/01/26 9:26 PM

Updated:: 2023/02/21 6:23 PM

Resolved:: 2023/02/16 10:24 PM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide