[LOG-4177] CLO pod crash if CLF is updated when CL in Unmanagment status - Red Hat Issue Tracker

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: Logging 5.7.3
Affects Version/s: Logging 5.7.z
Component/s: Log Collection
Labels:
- devel_ack+

Blocked:
False
Blocked Reason:
None
Ready:
False
Docs QE Status:
NEW
QE Status:
NEW
Release Note Text:

Hide
Before this update, the Cluster Logging Operator had the potential to crash in certain cases after being switched to the Unmanagement state. With this update, we have introduced a check to ensure that the ClusterLogging resource is in the correct Management state before initiating the reconciliation of the ClusterLoggingForwarder, this check effectively prevents operator crashes.

Show
Before this update, the Cluster Logging Operator had the potential to crash in certain cases after being switched to the Unmanagement state. With this update, we have introduced a check to ensure that the ClusterLogging resource is in the correct Management state before initiating the reconciliation of the ClusterLoggingForwarder, this check effectively prevents operator crashes.
Release Note Type:
Bug Fix
Intelligence Requested:
Market:

Sprint:
Log Collection - Sprint 237
Severity:
Moderate

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

$oc get pods
NAME                                        READY   STATUS             RESTARTS        AGE
cluster-logging-operator-74b988cb49-h4h4d   0/1     CrashLoopBackOff   8 (3m42s ago)   57m

$oc logs cluster-logging-operator-74b988cb49-h4h4d
{"_ts":"2023-06-01T06:35:14.815351321Z","_level":"0","_component":"cluster-logging-operator","_message":"starting up...","go_arch":"amd64","go_os":"linux","go_version":"go1.19.9","operator_version":"5.7"}
I0601 06:35:15.865983       1 request.go:682] Waited for 1.031138991s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/config.openshift.io/v1?timeout=32s
{"_ts":"2023-06-01T06:35:17.228756577Z","_level":"0","_component":"cluster-logging-operator","_message":"migrating resources provided by the manifest"}
{"_ts":"2023-06-01T06:35:17.236205467Z","_level":"0","_component":"cluster-logging-operator","_message":"Registering Components."}
{"_ts":"2023-06-01T06:35:17.23636406Z","_level":"0","_component":"cluster-logging-operator","_message":"Starting the Cmd."}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x14f7ba5]

goroutine 614 [running]:
github.com/openshift/cluster-logging-operator/controllers/forwarding.(*ReconcileForwarder).Reconcile(0xc000300cd0, {0x1af12b8, 0xc00082c3c0}, {{{0xc0002f89f0?, 0x178cde0?}, {0xc0002f6178?, 0x30?}}})
	/remote-source/cluster-logging-operator/app/controllers/forwarding/forwarding_controller.go:103 +0x905
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc000550160, {0x1af12b8, 0xc00082c360}, {{{0xc0002f89f0?, 0x178cde0?}, {0xc0002f6178?, 0x408b34?}}})
	/remote-source/cluster-logging-operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:114 +0x28b
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000550160, {0x1af1210, 0xc000943440}, {0x16c1740?, 0xc0001442c0?})
	/remote-source/cluster-logging-operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:311 +0x352
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000550160, {0x1af1210, 0xc000943440})
	/remote-source/cluster-logging-operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/remote-source/cluster-logging-operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/remote-source/cluster-logging-operator/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:223 +0x31c

How reproducible:

Always

Steps to Reproduce:

Forward logs using ClusterLogForwarder.
set clusterlogging/instance to Unmanage
delete/create ClusterLogForwarder/instance

Actual results:

cluster-logging-operator pod is CrashLoopBackOff

Expected results:

cluster-logging-operator doesn't crash.

Additional info:

links to

openshift/cluster-logging-operator#2045: LOG-4177: checking is ClusterLogging in Management state before perform reconcile

openshift/cluster-logging-operator#2061: LOG-4177: eventrouter fix

mentioned on

Merge request - Updated 2 upstream sources

Merge request - Updated US source to: da21570 Merge pull request #2035 from jcantrill/log4095_fluent

Details

Description

Description of problem:

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide